#!/usr/bin/env perl
# -*- perl -*-

### ============================================================================
### MAN-PAGE
###

=head1 NAME

B<MLocARNA> - multiple alignment of RNA

=head1 SYNOPSIS

mlocarna [options] <fasta file>

=head1 DESCRIPTION

B<MLocarna> computes a multiple sequence-structure alignment of RNA
sequences. The structure of these sequences does not have to be known
but is inferred based in simultaneous alignment and folding.

Generally, mlocarna takes multiple sequences as input, given in a
fasta file. The fasta file can be extended to specify structure and
anchor constraints that respectively control the possible foldings and
possible alignments. The main outcome is a multiple alignment together
with a consensus structure.

Technically, mlocarna works as front end to the pairwise alignment
tools locarna, locarna_p, and sparse (and even carna), which are
employed to construct the multiple alignment progressively.

Going beyond the basic progressive alignment scheme, Mlocarna
implements probabilistic consistency transformation and iterative
alignment, which are available in probabilistic mode. Moreover, the
LocARNA package provides an alternative multiple alignment tool
"locarnate", which generates alignments based on T-Coffee using
(non-probabilistic) consistency transformation.

=head1 OPTIONS

=head2 Load configurations from file

=over 4

=item B<--configure>=file

Load a parameter set from a configuration file of options and option
value pairs. This enables specifying (sets of) default parameters for
mlocarna, which can still be modified by other options to
mlocarna. Command line arguments always take precedence over this
configuration. Options are specified as single entries per line;
option value pairs, like option: value.  Whitespace and '#'-prefixed
comments are ignored.

=back

=head2 Major alignment modes

By default, mlocarna performs progressive alignment, where the
progressive alignment steps are computed by the pairwise aligner
locarna based on sequences and dot plots (RNAfold -p); subsequently,
partial alignmetns and their consensus dot plots.

=over 4

=item B<--probabilistic>

In probabilistic mode, mlocarna scores alignments using match
probabilities that are computed by a partition function approach
[tech. details: the probability computation is implemented in
locarna_p; the probability-based scoring is performed by locarna in
mea mode]. This enables mlocarna to consistency-transform the
probabilities (option --consistency-transform) and to compute
reliabilities.  The tool reliability-profile.pl is provided to
visualize reliability profiles. Reliabilities can also be used for
iterating the alignment with reliably aligned base pairs as structural
constraints (option --it-reliable-structure).

=item B<--sparse>

Apply the sparsified alignment algorithm SPARSE for all pairwise
alignments (instead of the default pairwise aligner locarna). SPARSE
supports stronger sparsification for faster alignment computation and
increases the structure prediction capabilities over locarna.

=back


=head2 Controlling Output

=over 4

=item  B<--tgtdir>

Target directory. All output files are written to this directory.  Per
default the target directory is generated from the input filename by
replacing suffix fa by (or appending) out.

=item  B<-v, --verbose>

Turn on verbose ouput. Shows progress of computation of all-2-all
pairwise alignments for guide tree computation; shows intermediary
alignments during the progressive alignment computation.

=item  B<--moreverbose>

Be even more verbose: additionally shows parameters for the pairwise
aligner; moreover, the calls and output of the RNA base pair
probability computations as well as the pairwise aligner during
progressive alignment.

=item  B<-q, --quiet>

Be quiet.

=item B<--keep-sequence-order>

Preserve sequence order of the input in the final alignment.  Affects
output to stdout and results/result.aln.

=item B<--stockholm>

Write STOCKHOLM files of all final and intermediate alignments (in
addition to CLUSTALW files).

=item B<--consensus-structure>

Type of consensus structures written to stockholm output (and screen in
verbose modes) [alifold|mea|none] (default: none).  This includes
intermediate alignments of the progressive multiple alignment. If not
explicitly specified othwise, the option alifold-consensus-dp implicitly
sets this to alifold.  Note that the alifold consenus of the final
alignment is computed and printed, regardless of this option.

=item B<-w, --width>=columns (120)

Output width for sequences in clustal-like and stockholm output; note
that the clustalw standard format requires 60 or less.

=item B<--write-structure>

Write guidance structure in output to stdout. This provides some
insight into the influence of structure into the generated pairwise
alignments. The guidance structure shows the base pairs 'predicted' by
each pairwise locarna (or sparse) alignment. These structures should
not be mistaken as predicted consensus structures of multiple
alignments. Consensus structures can be more adequately derived from
the multiple alignment. For this reason, mlocarna reports the
consensus structure by RNAalifold.

=back

=head2 Locality

=over 4

=item  B<--free-endgaps>

Allow free endgaps. (Corresponds to pairwise locarna option --free-endgaps "++++".)

=item  B<--free-endgaps-3>

Allow free endgaps 3'.

=item  B<--free-endgaps-5>

Allow free endgaps 5'.

=item B<--sequ-local>=bool (false)

Turns on/off sequence locality [def=off]. Sequence locality refers to
the usual form of local alignment. If on, mlocarna bases all
calculations on local pairwise alignments, which determin the best
alignments of subsequences (disregarding dissimilar starts and
ends). Note that truely local structure alignments as well as local
multiple alignments are still a matter of research; so don't expect
perfect results in all instances.

=item B<--struct-local>=bool (false)

Turns on/off structure locality [def=off]. Structural locality enables
skipping entire substructures in alignments. In pairwise alignments,
this allows one exclusion of some subsequence in each loop; thus,
guaranteeing that the (structure locally) aligned parts of the
sequences are always connected w.r.t. the predicted structure but not
necessarily consecutive in the sequence. Structure locality does not
imply sequence locality, but rather the two concepts are orthogonal.

=item B<--penalized>=score

Variant of sequence local alignment (cf. --sequ-local), where the
specified penalty score is subtracted for each base in the local
alignment. [Experimental]

=back

=head2 Pairwise alignment and scoring

=over 4


=item B<--indel=score> (-150)

Score of each single base insertion or deletion.

=item B<--indel-opening>=score (-750)

Score of opening an insertion or deletion, i.e. score for a
consecutive run of deletions or insertions. Indel opening score and
indel score define the affine scoring of gaps.

=item B<-m, --match>=score (50)

Score of a base match (unless ribosum-based scoring)

=item B<-M, --mismatch>=score (0)

Score of a base mismatch (unless ribosum-based scoring)

=item B<--use-ribosum>=bool (true)

Use ribosum scores for scoring base matches and base pair matches;
note that tau=0 suppresses any effect on the latter.

=item B<--ribosum-file>=file

File specifying the Ribosum base and base-pair similarities. [default: use RIBOSUM85_60 without requiring a Ribosum file.]

=item B<-s, --struct-weight>=score (200)

Maximum weight of one predicted arc, aka base pair. Note that this
means that the maximum weight of an arc match is twice as high. The
maximum weight is assigned to base pairs with (almost) probability 1
in the dot plot; less probable base pairs receive gradually degrading
scores. The struct-weight factor balances the score contribution from
structure to the score contribution from base similarity scores
(e.g. ribosum scores).

=item B<-e, --exp-prob>=prob

Expected probability of a base pair.

=item B<-t, --tau>=factor (0)

Tau factor in percent. The tau factor controls the contribution of
sequence-dependent scores to the score of arc matches.

=item B<-E, --exclusion>=<score> (0)

Weight of an exclusion, i.e. an ommitted subsequence in a loop, which
applies only to structural local alignment.

=item B<--stacking>

Use stacking terms. In this case, stacked arcs are scored based on
conditional probabilities (conditioned by their stacked inner arc)
rather than unconditioned base pair probabilities. [Experimental]

=item B<--new-stacking>

Use new stacking terms; cf. --stacking. These terms directly award
bonuses to stacking. [Experimental]

=back

=head2 Alignment heuristics

Several parameters are available to speed up the pairwise alignment
computations heuristically. Choosing these parameters reasonably is
necessary to achieve good trade-off between speed and accuracy,
especially for large alignment instances.

=over 4

=item B<-p, --min-prob>=probability (0.001)

Minimum base pair / arc probability. Arc with lower probability in the
input RNA structure ensembles are ignored.

=item B<-P, --tree-min-prob>=probability

Minimal prob for constructing guide tree. This probability can be set
separately for the all-2-all comparison for constructing the guide
tree and the progressive/iterative alignment steps.

=item B<--max-bps-length-ratio>=factor (0.0)

Maximal ratio of the number of base pairs divided by sequence length
(default: no effect)

=item B<-D, --max-diff-am>=difference

Maximal difference for lengths of matched arcs. Two arcs that have a
higher difference of their lengths are ignored. This speeds up the
alignment, since less arc comparisons (i.e. less DP matrices) have to
be computed. [def: off/-1]

=item B<-d, --max-diff>=difference

Maximal difference of the positions of any two bases that are
considered to be aligned. Bases with higher difference are generally
not aligned. This allows banding of the DP matrices and thus can
result in high speed ups. Note that the semantic changes in the
context of a reference alignment specified with max-diff-aln. Then,
the difference to the reference alignment is restricted. [def: off/-1]

=item B<--max-diff-at-am>=difference

Same restriction as max-diff but only at the ends of arcs in arc
matches. [def: off/-1]

=item B<--min-trace-probability>=probability

Minimal sequence alignment probability of potential traces
(probability-based sequence alignment envelope) [default=1e-4,
moderate filter].

=item B<--max-diff-aln>=file

Computes "realignment" in the environment of the given reference
alignment (file in clustalw format) by constraining the maximum
difference to this reference (controlled by --max-diff). The input
sequences (and their names) have to be identical to these alignment
sequences; however the alignment is allowed to contain extra
sequences, which are ignored. In combination with option --realign,
the reference alignment is taken from the (main) input file. In this
case, the 'file' argument should be '.', but is ignored (with warning)
otherwise.

=item B<--max-diff-relax>

Relax deviation constraints (cf. --max-diff-aln) in multiple
aligmnent. This option is useful if the default strategy for
realignment fails.

=item B<-a, --min-am-prob>=probability (0.001)

Minimum arc-match probability (filters output of locarna-p)

=item B<-b, --min-bm-prob>=probability (0.001)

Minimum base-match probability (filters output of locarna-p)

=back

=head2 Low-level selection of pairwise alignment tools and options

=over 4

=item B<--pw-aligner>

Utilize the given tool for computing pairwise alignments
(def=locarna).

=item B<--pw-aligner-p>=tool

Utilize the given tool for computing partition function pairwise
alignments (def=locarna_p).

=item B<--pw-aligner-options>

Additional option string for the pairwise alignment tool (def="").

=item B<--pw-aligner-p-options>

Additional option string for the partition function pairwise alignment tool (def="").

=back

=head2 Controlling the guide tree construction

=over 4

=item B<--treefile>=file

File with guide tree in NEWICK format. The given tree is used as guide
tree for the progressive alignment. This saves the calculation of
pairwise all-vs-all similarities and construction of the guide tree.

=item B<--similarity-matrix>=file

File with similarity matrix. The similarities in the matrix are used
to construct the guide tree for the progressive alignment. This saves
the calculation of pairwise all-vs-all similarities.

=item B<--score-lists>

Construct the guide tree from pairwise scores in files scores* in the subdirectory scores of the target directory.
The scores are typically precomputed, possibly in a distributed way, using --compute-pairwise-scores.

=item B<--compute-pairwise-scores>=k/N

Compute only the pairwise alignments for the guide tree construction. Write
scores to the file
$tgtdir/scores/scores-$k
and terminate. By computing only the k-th fraction of N parts, the option supports
distributing the computation of the alignments. Before computing the
pairwise scores, the dot plot files should be precomputed using --only-dps.
(see also: --score-lists)

=item B<--graphkernel>

Use the graphkernel for constructing the guide tree.

=item B<--svmsgdnspdk>=program

Specify the svmsgdnspdk program (potentially including path). Default:
use "svmsgdnspdk" in path.

=item B<--fasta2shrep>=program

Program "fasta2shrep" for generating graphs from the input sequences
for use with the graph kernel guide tree generation (potentially
including path). Default: use "fasta2shrep_gspan.pl" in path.

=item B<--fasta2shrep-options>=argument-string

Command line arguments for fasta2shrep. Default: "-wins 200 -shift 50
-stack -t 3 -M 3".

=back

=head2 Controlling multiple alignment construction

=over 4

=item B<--alifold-consensus-dp>

Employs B<RNAalifold -p> for generating consensus dotplot after each
progressive alignment step. This replaces the default consensus
dotplot computation, which averages over the input dot plots.  This
method should be used with care in combination with structural
constraints, since it ignores them for all but the pairwise alignments
of single sequences. Furthermore, note that it does not support
B<--stacking> or B<--new-stacking>.

=item B<--max-alignment-size>=size

Limit the maximum number of sequences that are aligned together by progressive alignment. This can be used to save
unnecessary computations, when producing a clustering of the input RNAs rather than constructing a single multiple alignment.
[default: no limit].

=item B<--local-progressive>

Align only the subalignment of locally aligned subsequences in
subsequent steps of the progressive multiple alignment. Note: this is
only effective if local alignment is turned on. (Default for
sequence local alignment; turn off by B<--global-progressive>)

=item B<--global-progressive>

Use alignments including "locality gaps" in subsequent steps of the
progressive multiple alignment. Note: this is only effective if local
alignment is turned on. (Opposite of B<--local-progressive>)

=item B<--consistency-transformation>

Apply probabilistic consistency transformation (only possible in
probabilistic mode).

=item B<--iterate>

Refine iteratively after progressive alignment. Currently, iterative
refinement optimizes the SCI or RELIABILITY (not the locarna score)! Iterative
refinement realigns all binary splits along the guide tree.

=item  B<--iterations>=number

Refine iteratively for given number of iterations (or stop at
convergence).

=item  B<--it-reliable-structure>=number

Iterate alignment <num> times with reliable structure. This works only
in probabilistic mode, when reliabilities can be computed.

=back

=head2 Further options for probabilistic mode

=over 4

=item  B<--pf-only-basematch-probs>

Use only base match probabilities (no base pair match probabilities).

=item  B<--extended-pf>

 Use extended precision for partition function values. This increases
 run-time and space (less than 2x), however enables handling
 significantly larger instances.

=item  B<--quad-pf>

 Use quad precision for partition function values. Even more precision
 than extended pf, but usually much slower (overrides extended-pf).

=item  B<--pf-scale=<scale>>

Scale of partition function; use for avoiding overflow in larger instances.


=item  B<--fast-mea>

Compute base match probabilities using Gotoh PF-algorithm.


=item  B<--mea-alpha>

Weight of unpaired probabilities in fast mea mode.


=item  B<--mea-beta>

Weight of base pair match contribution in probabilistic mode.


=item  B<--mea-gamma>

Reserved parameter for fast-mea mode.

=item  B<--mea-gapcost>

Turn on gap penalties in probabilistic/mea mode (default: off).

=item B<--write-probs> / B<--no-write-probs>

Write / don't write probabilities (of base matches and arc matches) to the target directory.
Override by single options --(no-)write-bm-probs and --(no-)write-am-probs is possible. Use this to
make the probability files available for post-processing. (default: don't write).

=item  B<--write-bm-probs> / B<--no-write-bm-probs>

Don't write / Write base match probabilities to files in target dir (default: don't write).

=item  B<--write-am-probs> / B<--no-write-am-probs>

Don't write / Write arc match probabilities to files in target dir (default: don't write).

=back

=head2 Miscallaneous modes of operation

=over 4

=item  B<--realign>

Realignment mode. In this mode, the input must be in clustal format
and is interpreted as alignment of the input sequences; the sequences
are obtained by removing all gap symbols. Moreover, the given
alignment is set as reference alignment for --max-diff-aln.  Structure
and anchor constraints can be specified as consensus constraints in
the input; constraints are specified as 'alignment strings' with names
'#A1', '#S', or '#FS' for anchor, structure, or fixed structure
constraints, respectively. Characters in the '#A1' anchor
specification other than '-' and '.' constrain the aligned residues in
the respective column to remain aligned (blanks are disallowed;
annotations '#A2', '#A3', ... are ignored). The consensus structure
constraint is equivalent to constraining each single sequence by the
projection of the consensus constraint to the sequence (removing all
base pairs with at least one gapped end).

=item  B<--dp-cache>=directory

Use directory <dir> as cache for dot plot or pp files (useful for avoiding multiple computation).

=item  B<--only-dps>

Compute only the pair probability files / dot plots, don't align (useful for filling the dp-cache).

=item  B<--evaluate>=file

Evaluate the given multiple alignment (clustalw aln format, or use
--eval-fasta). This requires that probailities are already computed
(mlocarna --probabilistic) and present in the target directory
(--tgtdir).

=item  B<--eval-fasta>

Assume that alignment for evaluation (cf. --evaluate) is in fasta format.

=back

=head2 Constraints

=over 4

=item  B<--anchor-constraints=<file>>

Read anchor constraints from bed format specification.

Anchor constraints in four-column bed format specify positions of
named anchor regions per sequence. The 'contig' names have to
correspond to the fasta input sequence names. Anchor names must be
unique per sequence and regions of the same name for different
sequences must have the same length. This constrains the alignment to
align all regions of the same name.

The specification of anchors via this option removes all anchor
definitions that may be given directly in the fasta input file!

=item  B<--ignore-constraints>

Ignore all constraints (anchor and structure constraints) even if given.

=back

=head2 Rna folding (RNAfold/RNAplfold)

=over 4

=item  B<--noLP> / B<--LP>

Disallow/Allow lonely pairs (default: Disallow).

=item  B<--maxBPspan>

Limit maximum span of base pairs (default off).

=item  B<--relaxed-anchors>

Relax semantics of anchor constraints (default off, meaning 'strict'
semantics). For lexicographically ordered anchors, where each sequence
is annotated with exactly the same names, both semantics are
equivalent; thus, in this common case, the subtle differences can be
ignored. In strict semantics, anchor names must be ordered
lexicographically and can only be aligned in this order. In relaxed
semantics, the only requirement is that equal anchor names are
matched. Consequently, anchor names that don't occur in all sequences
could be overwritten (if two names are assigned to the same position)
or even introduce inconsistencies.

=item  B<--plfold-span=span>

Use RNAplfold with span.

=item  B<--plfold-winsize=ws>

Use RNAplfold with window of size ws (default=2*span).

=item  B<--rnafold-parameter=<file>>

Parameter file for RNAfold (RNAfold's -P option)

=item  B<--rnafold-temperature=<temp>>

Temperature for RNAfold (RNAfold's -T option)

=item  B<--skip-pp>

Skip computation of pair probs if the probabilities are already
existing. Non-existing ones are still computed.

=item B<--no-bpp-precomputation>

Switch off precomputation of base pair probabilties. Overwrite
potentially existing input files.  (compare skip-pp). For use with
special pairwise aligners (e.g. locarna_n) that recompute the base
pair probabilities at each invokation.

=item B<--in-loop-probabilities>

Turn on precomputation of in loop probabilties. For use with special
pairwise aligners (e.g. locarna_n) that use such probabilities.

=back

=head2 Multithreading

=over 4

=item  B<--threads, --cpus>=number

Use the given number of threads for computing pair probabilities and
all-2-all alignments in parallel (multicore/processor support).

Be aware: mlocarna seems not to scale well for more than a few threads (often only 2 or 3).
Using more threads is often detrimental, since it strongly increases memory consumption
due to the current perl threading implementation. This unfortunate behavior seems hard
to improve without major rewrite of the software.

=back

=head2 Getting Help

=over 4

=item  B<--help>

Brief help message

=item  B<--man>

Full documentation

=back

The sequences are given in input file <file> in mfasta
format.  All results are written to a target directory <dir>. If the
file tree is given, contained tree (in NEWICK-tree format) is used as
guide tree for the progressive alignment. The final results are
collected in <tgtdir>/results. The final multiple alignment is
<tgtdir>/results/result.aln.

=head1 EXAMPLES

=head2 Calling mlocarna

[Note that the LocARNA distribution provides files of the following and other
examples in Data/Examples.]

Sequences are typically given in plain fasta format like

=begin roff

.nf
.ft CW

=end roff


    example.fa
    ----------------------------------------
    >fruA
    CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG
    >fdhA
    CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG
    >vhuU
    AGCUCACAACCGAACCCAUUUGGGAGGUUGUGAGCU
    ----------------------------------------


=begin roff

.ft
.fi

=end roff


To align these sequences, simply call

  mlocarna example.fa

Usually, it makes sense to set additional options; this is either done
on the command line or via configuration files. A reasonable
small configuration for global alignment of large instances would be

=begin roff

.nf
.ft CW

=end roff

    short-example.cfg
    ----------------------------------------
    max-diff-am: 25
    max-diff:    60
    min-prob:    0.01
    plfold-span: 100
    indel:       -50
    indel-open:  -750
    threads:     8   # <- adapt to your hardware
    alifold-consensus-dp
    ----------------------------------------

=begin roff

.ft
.fi

=end roff


To use it, call

    mlocarna --config short-example.cfg example.fa

which is equivalent to

=begin roff

.nf
.ft CW

=end roff

    mlocarna --max-diff-am 25 --max-diff 60 --min-prob 0.01 \
             --indel -50 --indel-open -750 \
             --plfold-span 100 --threads 8 --alifold-consensus-dp \
             example.fa

=begin roff

.ft
.fi

=end roff

For  probabilistic alignment with consistency transformation, call

  mlocarna --probabilistic --consistency-transform example.fa

In both cases, mlocarna writes the main results to stdout and more
detailed results to the target directory example.out. The results
directory is overwritten if it exists already. To avoid this, one can
specify the target directory (--tgtdir).

=head2 Use of constraints

Mlocarna supports structure constraints for folding and anchor
constraints for alignment. Both types of constraints can be specified
in extension of the standard fasta format via 'constraint
lines'. Fasta-ish input with constraints looks like this

=begin roff

.nf
.ft CW

=end roff


    example-w-constraints.fa
    ----------------------------------------
    >A
    GACCCUGGGAACAUUAACUACUCUCGUUGGUGAUAAGGAACA
    ..((.(....xxxxxx...................))).xxx #S
    ..........000000.......................111 #1
    ..........123456.......................123 #2
    >B
    ACGGAGGGAAAGCAAGCCUUCUGCGACA
    .(((....xxxxxx.......))).xxx #S
    ........000000...........111 #1
    ........123456...........123 #2
    ----------------------------------------


=begin roff

.ft
.fi

=end roff

The same anchor constraints (like by the lines tagged #1, #2) can
alternatively be specified in bed format by the entries

=begin roff

.nf
.ft CW

=end roff

    example-anchors.bed
    ----------------------------------------
    A	10	16	first_box
    B	8	14	first_box
    A	39	42	ACA-box
    B	25	28	ACA-box
    ----------------------------------------

=begin roff

.ft
.fi

=end roff

where anchor regions (boxes) have arbitrary but matching names
and contig/sequence names correspond to the sequence names
of the fasta(-like) input.

Given, e.g.

=begin roff

.nf
.ft CW

=end roff

    example-wo-anchors.fa
    ----------------------------------------
    >A
    GACCCUGGGAACAUUAACUACUCUCGUUGGUGAUAAGGAACA
    ..((.(....xxxxxx...................))).xxx #S
    >B
    ACGGAGGGAAAGCAAGCCUUCUGCGACA
    .(((....xxxxxx.......))).xxx #S
    ----------------------------------------

=begin roff

.ft
.fi

=end roff

one calls

  mlocarna --anchor-constraints example-anchors.bed  example-wo-anchors.fa

=head2 Realignment

In realignment mode (option --realign), mlocarna is called with an
input alignment in clustal format, e.g.

  mlocarna --realign example-realign.aln

This allows to define constraints as 'consensus constraints' in the input, e.g.

=begin roff

.nf
.ft CW

=end roff

    example-realign.aln
    ----------------------------------------
    CLUSTAL W

    fruA               --CCUCGAGGGGAACCCGAA-------------AGGGACCCGAGAGG--
    vhuU               AGCUCACAACCGAACCCAUU-------------UGGGAGGUUGUGAGCU
    fdhA               CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAG-GUGGCG
    #A1                ..*...........CCC.............................5..
    #S                 ((((((.((((...(((.................))).)))).))))))
    ----------------------------------------

=begin roff

.ft
.fi

=end roff

Note that anchor names are arbitrary and the consensus structure is
'projected' to the single sequences.  Moreover, the input alignment
can be used as reference for fast limited realignment, e.g. call to
realign in distance 5 of the reference alignment:

  mlocarna --realign example-realign.aln --max-diff 5 --max-diff-aln .

=head1 AUTHORS

Sebastian Will
Christina Otto (ExpaRNA-P, sparsification classes for ExpaRNA-P and SPARSE)
Milad Miladi (SPARSE)

=head1 ONLINE INFORMATION

For download and online information, see
L<https://github.com/s-will/LocARNA>
and L<http://www.bioinf.uni-freiburg.de/Software/LocARNA>.

Latest releases are available as source code on Github at
L<https://github.com/s-will/LocARNA/releases>.


=head1 REFERENCES

Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler, and
Rolf Backofen.  Inferring non-coding RNA families and classes by means
of genome-scale structure-based clustering.  PLOS Computational
Biology, 3 no. 4 pp. e65, 2007.
doi:10.1371/journal.pcbi.0030065

Sebastian Will, Tejal Joshi, Ivo L. Hofacker, Peter F. Stadler,
and Rolf Backofen. LocARNA-P: Accurate boundary prediction and
improved detection of structural RNAs. RNA, 18 no. 5 pp. 900-914, 2012.
doi:10.1261/rna.029041.111

Sebastian Will, Michael Yu, and Bonnie Berger. Structure-based Whole
Genome Realignment Reveals Many Novel Non-coding RNAs. Genome Research,
no. 23 pp. 1018-1027, 2013. doi:10.1101/gr.137091.111

=cut

use strict;
use warnings;

use autodie qw(open close);

use FindBin;
use lib "$FindBin::Bin/../lib/perl";
my $prefix= "$FindBin::Bin/..";
my $bindir = "$FindBin::Bin";

use MLocarna;
use MLocarna::SparseMatrix;
use MLocarna::Tree;
use MLocarna::MatchProbs;
use MLocarna::Aux;
use MLocarna::NameNormalizer;

use Cwd qw(abs_path);
use List::Util qw(min max);

use File::Copy qw(copy);
use File::Path qw(rmtree);

########################################
## threading support
##
use Config;

my $THREADS_ENABLED = $Config{useithreads};

if ($THREADS_ENABLED) {
    require MLocarna::threaded;
    import MLocarna::threaded;
} else {
    require MLocarna::unthreaded;
    import MLocarna::unthreaded;
}

## ------------------------------------------------------------
## global constants

$MLocarna::PACKAGE_STRING=readpipe "$bindir/locarna --version";
chomp $MLocarna::PACKAGE_STRING;

## vienna package programs
my $RNAfold = "RNAfold";
my $RNAplfold = "RNAplfold";

my $input_dir="input";
my $probs_dir="probs";
my $intermediate_dir="intermediates";
my $results_dir="results";

## turn off computation of reliability profiles/dotplots for single
## sequences
my $reliabilities_single_sequences = 1;
## control where to write single arcmatch rels
my $single_reliabilities_dir="single_reliabilities";

##------------------------------------------------------------
## options

# we use the GetOpt module in a mode that allows
# to pass a list of options quite easily to locarna
# while handling other options explicitely

use Getopt::Long;
use Pod::Usage;

my @options = #options that are not or not automatically passed to locarna
    (
     "noLP",
     "LP",

     "free-endgaps",
     "free-endgaps-3",
     "free-endgaps-5",

     "pw-aligner=s",
     "pw-aligner-p=s",

     "pw-aligner-options=s",
     "pw-aligner-p-options=s",

     "min-prob|p=f",
     "tree-min-prob|P=f",

     "stacking",

     "new-stacking",

     # "extlib",

     "skip-pp",

     "alifold-consensus-dp",

     "max-diff-aln=s",

     "iterate",
     "iterations=i",

     "it-reliable-structure=i",

     "ignore-constraints",
     "consistency-transformation",

     "treefile=s",
     "similarity-matrix=s",

     "compute-pairwise-scores=s",
     "score-lists",

     "graphkernel",
     "svmsgdnspdk=s",
     "svmsgdnspdk-radius=s",
     "svmsgdnspdk-distance=s",
     "fasta2shrep=s",
     "fasta2shrep-options=s",

     "tgtdir|outdir=s",

     "min-bm-prob=f",
     "min-am-prob=f",

     "no-write-probs",
     "write-probs",
     "no-write-bm-probs",
     "no-write-am-probs",
     "write-bm-probs",
     "write-am-probs",

     "mea-alignment",
     "mea-alpha=i",
     "mea-beta=i",
     "mea-gamma=i",
     "mea-gapcost",

     "probabilistic",
     "sparse",
     "extended-pf",
     "quad-pf",
     "pf-scale=f",
     "only-basematch-probs",
     "fast-mea",

     "dp-cache=s",
     "only-dps",

     "rnafold-parameter=s",
     "rnafold-temperature=f",

     ## plfold
     "plfold-span=i",
     "plfold-winsize=i",

     "cpus=i",
     "threads=i",

     "evaluate=s",
     "eval-fasta",

     "no-bpp-precomputation",

     "in-loop-probabilities",

     "write-structure",

     "width=i",

     "max-alignment-size=i",

     "keep-sequence-order",

     "stockholm",
     "consensus-structure=s",

     "local-progressive",
     "global-progressive",

     "anchor-constraints=s",

     "realign",

     "verbose|v",
     "moreverbose",
     "quiet",

     "version",

     "help",
     "man"
    );


# options with argument that are passed to locarna
# autopass options will always be passed to locarna for tree computation
# and progressive steps *and* to locarna_p
#
my @autopass_arg_options =
    (
     "match|m=i",
     "mismatch|M=i",
     "indel|i=i",
     "indel-opening=i",
     "unpaired-penalty=i",
     "ribosum-file=s",
     "use-ribosum=s",
     "ribofit=s",
     "struct-weight|s=i",
     "tau|t=i",
     "exclusion|E=i",
     "exp-prob|e=f",

     "penalized=i",

     "max-diff-am=i",
     "max-diff=i",
     "max-diff-at-am=i",

     "min-trace-probability=f",

     "maxBPspan=i",

     "struct-local=s",
     "sequ-local=s",

     "probcons-file=s",
     "match-prob-method=i",

     "temperature=i",
     "pf-struct-weight=i",

     "temperature-alipf=i",

     "max-bps-length-ratio=f"
    );

my @autopass_noarg_options = ( "max-diff-relax", "relaxed-anchors" );


## ----------------------------------------
## the hash of options
#
my $opts;

## ----------------------------------------
## some default values
#

# handle noLP / LP later
# $opts->{'noLP'}=1;

$opts->{'skip-pp'}=0; ## =1 skips the computation of pair probabilities
                    ## for files that exist already
$opts->{'it-reliable-structure'}=0;

$opts->{'min-prob'}=0.001;

$opts->{'struct-weight'}=200;
$opts->{'max-diff-am'}=30;

$opts->{'threads'}=1;

$opts->{'pw-aligner'}="$bindir/locarna";
$opts->{'pw-aligner-p'}="$bindir/locarna_p";

$opts->{'pw-aligner-options'}="";
$opts->{'pw-aligner-p-options'}="";

# defaults for experimental graph kernel interface
$opts->{'svmsgdnspdk'}="svmsgdnspdk";
$opts->{'svmsgdnspdk-radius'}=2;
$opts->{'svmsgdnspdk-distance'}=4;

$opts->{'fasta2shrep'}="fasta2shrep_gspan.pl";

# default options of fasta2shrep. In particular, specify
# -wins window size
# -shift window shift in percent of window size
# -stack extra stack vertices
# -t grammar abstraction level (of RNAshape)
# -M number of shapes
#
$opts->{'fasta2shrep-options'}="-wins 200 -shift 50 -stack -t 3 -M 3";


## ----------------------------------------
## if config file is given, read config into options hash to set
## default options
my $config_file;

## we run two option parses
## 1) detect option --config and if present load defaults from file
## 2) parse all other options

## allow unknown options to pass through for the first parse
#
Getopt::Long::Configure("pass_through");
#
if (GetOptions( 'configure=s' => \$config_file,
                'verbose' => \$opts->{'verbose'},
                'quiet' => \$opts->{'quiet'}
              ) && defined($config_file)) {
    print "Load configuration from $config_file.\n" unless $opts->{'quiet'};
    {
        open(my $in, "<","$config_file");
        load_keyvalue_pairs( $in, $opts );
        close $in;
    }
}

## perform main option parse
#
Getopt::Long::Configure("no_ignore_case");
#
GetOptions( $opts, @options, @autopass_arg_options, @autopass_noarg_options )
    || pod2usage(2);



# ----------------------------------------
# synonymous options
$opts->{'threads'} = $opts->{'cpus'} if defined $opts->{'cpus'};
$opts->{'cpus'} = $opts->{'threads'} if defined $opts->{'threads'};

## print out options hash -- for debugging
# for my $key (keys %$opts) {
#     my $value=$opts->{$key};
#     if (defined $value) {
#         print "$key\t=\t$value\n";
#     }
# }
#

help_msg() if $opts->{'help'};
pod2usage(-exitstatus => 0, -verbose => 2) if $opts->{'man'};


## option --version

if ($opts->{'version'}) {
   system("$bindir/locarna", "--version");
   exit(0);
}


# parameter mfasta file
## input-file = filename of fasta file with input sequences
if ($#ARGV == 0) {
    $opts->{'input-file'} = $ARGV[0];
}else {
    help_msg();
}



sub help_msg {
    print STDERR
"USAGE: mlocarna [options] <fasta file>

Multiply aligns the RNA sequences in the fasta file based on
sequence-structure similarity.

For details on options call mlocarna --man.
";
exit 0;
}


sub die_hard {
    my ($errtext)=@_;
    printerr $errtext;
    exit(-1);
}


############################################################
# load key value pairs
#
# load pairs of 'key: value' pairs and single keys from file handle
# into hash; blanks and '#'-prefixed comments are ignored
#
# @param $fh input file handle
# @param $hash reference to hash
#
sub load_keyvalue_pairs {
    my ($fh,$hash)=@_;
    while(<$fh>) {
        $_ =~ s/\s*#.*$//; # remove comments
        next if $_ =~ /^\s*$/; # ignore whitespace lines

        $_ =~ s/^\s+//; # remove whitespace prefix
        $_ =~ s/\s+$//; # remove whitespace suffix

        if ( $_ =~ /^(\S+)\s*:\s*(.+)$/ ) { # line "key: val"
            $hash->{$1} = $2;
        } elsif ( $_ =~ /^(\S+)$/ ) { # line "key"
            $hash->{$1} = 1;
        } else {
            print STDERR "  WARNING: line $_ does not define key or proper colon-separated key value pair.";
        }
    }
}


## ----------------------------------------
## verbosity level
if ($opts->{'quiet'}) {
    $MLocarna::Aux::verbosemode=4;
}

# mode 3 is 'normal' verbosity

if ($opts->{'verbose'}) {
    $MLocarna::Aux::verbosemode=2;
}

if ($opts->{'moreverbose'}) {
    $opts->{'verbose'}=1;
    $MLocarna::Aux::verbosemode=1;
}

## make STDOUT 'hot', so output does not get mixed up when redirected
$|=1;

## ------------------------------------------------------------
## write header

printmsg 3,"mLocARNA --- multiple Local (and global) Alignment of RNA --- ";
printmsg 3,readpipe("$bindir/locarna --version");


## target/output directory
if (!defined($opts->{'tgtdir'})) {
    $opts->{'tgtdir'} = $opts->{'input-file'};
    $opts->{'tgtdir'} =~ s/\.mfasta$|\.fasta$|\.fa$|\.aln$//;
    $opts->{'tgtdir'} .= ".out";
}

if (defined($opts->{'evaluate'})) {
    printmsg 3, "EVALUATION MODE\n\n";

    run_evaluation_mode($opts->{'evaluate'},$opts->{'tgtdir'}."/probs");

    exit 0;
}

# LP and noLP should never both be given on the command line
if (defined($opts->{'LP'}) && defined($opts->{'noLP'})) {
    printerr "ERROR: The flags --noLP and --LP contradict each other.\n";
    exit(-1);
}

# this makes noLP the default, unless LP is given
if (defined($opts->{'LP'})) {
    $opts->{'noLP'}=0;
} else {
    $opts->{'noLP'}=1;
}


## if option sparse is given, then use sparse as pairwise aligner
if ($opts->{'sparse'}) {
    $opts->{'pw-aligner'} = "$bindir/sparse";
    $opts->{'in-loop-probabilities'}=1;
}

## construct parameter string for locarna

my @locarna_params = split(/\s+/, $opts->{'pw-aligner-options'});
my @locarna_p_params = split(/\s+/, $opts->{'pw-aligner-p-options'});

## first set parameters that are equal for tree construction and other locarna runs
## (we copy locarna_params to locarna_params_tree later)

my $endgaps_string="++++";
if (defined($opts->{'free-endgaps-3'})) {
    $opts->{'free-endgaps'}=1;
    $endgaps_string="-+-+" unless defined($opts->{'free-endgaps-5'});
} elsif (defined($opts->{'free-endgaps-5'})) {
    $opts->{'free-endgaps'}=1;
    $endgaps_string="+-+-";
}
if (defined($opts->{'free-endgaps'})) {
    if($opts->{'probabilistic'}) {
        printerr "ERROR: no free-end gaps in probabilistic mode.\n";
        exit(-1);
    } else {
        push @locarna_params, "--free-endgaps" => $endgaps_string;
    }
}

if (defined($opts->{'only-basematch-probs'})) {
    push @locarna_p_params, "--include-am-in-bm";
}


if (defined($MLocarna::MatchProbs::min_bm_prob)) {
    push @locarna_p_params, "--min-bm-prob" => $MLocarna::MatchProbs::min_bm_prob;
}

if (defined($MLocarna::MatchProbs::min_am_prob)) {
    push @locarna_p_params, "--min-am-prob" => $MLocarna::MatchProbs::min_am_prob;
}

if (defined($opts->{'extended-pf'})) {
    push @locarna_p_params, "--extended-pf";
}

if (defined($opts->{'quad-pf'})) {
    push @locarna_p_params, "--quad-pf";
}

if (defined($opts->{'pf-scale'})) {
    push @locarna_p_params, "--pf-scale" => $opts->{'pf-scale'};
}

## handle plfold
if (defined($opts->{'plfold-span'})) {
    printmsg 1,"Use plfold for local folding.\n";
    if (!defined($opts->{'plfold-winsize'})) {
        $opts->{'plfold-winsize'} = 2*$opts->{'plfold-span'};
    }
    if (defined($opts->{'stacking'}) || defined($opts->{'new-stacking'})) {
        printerr "ERROR: Can't use stacking with plfold.\n";
        exit -1;
    }
} elsif (defined($opts->{'plfold-winsize'})) {
    printerr "Warning: winsize for plfold given, but no span. Winsize ignored.\n";
}

push @locarna_params, "-v" if ($opts->{'moreverbose'});

# ----------------------------------------
# handle autopass options
#
foreach my $opt (@autopass_arg_options) {
    $opt =~ s/[|=].*//;

    if (exists $opts->{$opt}) {
        push @locarna_params, "--$opt" => $opts->{$opt};
    }
}

foreach my $opt (@autopass_noarg_options) {
    $opt =~ s/[|].*//;

    if (exists $opts->{$opt}) {
        push @locarna_params, "--$opt";
    }
}
#
# ----------------------------------------


push @locarna_p_params, @locarna_params;

## options only passed to locarna (not to locarna_p)
if (defined($opts->{'width'})) {
    push @locarna_params, "--width" => $opts->{'width'};
} else {
    $opts->{'width'}=120;
}

push @locarna_params, "--noLP" if ($opts->{'noLP'});

if ($opts->{'probabilistic'}) {
    push @locarna_params, "--mea-alignment";

    push @locarna_params, "--mea-gapcost" if $opts->{'mea-gapcost'};

    push @locarna_params, "--mea-beta" => $opts->{'mea-beta'} if defined($opts->{'mea-beta'});

    if ( defined($opts->{'fast-mea'}) ) {
        push @locarna_params, "--mea-alpha" => $opts->{'mea-alpha'} if defined($opts->{'mea-alpha'});
        push @locarna_params, "--mea-gamma" => $opts->{'mea-gamma'} if defined($opts->{'mea-gamma'});
    }
}

# whether to compute consensus structures and which type
if (not defined($opts->{'consensus-structure'})) {
    if ($opts->{"alifold-consensus-dp"}) {
        $opts->{'consensus-structure'}="alifold";
    } else {
        $opts->{'consensus-structure'}="none";
    }
}

push @locarna_params, "--consensus-structure" => $opts->{'consensus-structure'};

## copy to tree params
my @locarna_params_tree = ();
push @locarna_params_tree, @locarna_params;

## special treatment of min-prob and tree-min-prob

push @locarna_p_params, "--min-prob" => $opts->{'min-prob'} if (defined($opts->{'min-prob'}));

if ($opts->{'probabilistic'} && $opts->{'only-basematch-probs'}) {
    # when using only basematch probabilities,
    # turn off structural scoring for locarna

    push @locarna_params, "--min-prob" => "1";
    push @locarna_params_tree, "--min-prob" => "1";
} else {

    push @locarna_params, "--min-prob" => $opts->{'min-prob'} if (defined($opts->{'min-prob'}));

    if (defined($opts->{'tree-min-prob'})) {
        push @locarna_params_tree, "--min-prob" => $opts->{'tree-min-prob'};
    } else {
        push @locarna_params_tree, "--min-prob" => $opts->{'min-prob'} if (defined($opts->{'min-prob'}));
    }
}

if ($opts->{'write-structure'}) {
    push @locarna_params, "--write-structure";
}

##################################################
## use global or local alignments in subsequent progressive steps
##
## make local progressive default, if sequence local alignment is turned on
if ((grep $opts->{'sequ-local'}, ("on","true","t","1"))
    && !$opts->{'global-progressive'}) {
    $opts->{'local-progressive'}=1;
}

if ($opts->{'local-progressive'}) {
    if ($opts->{'global-progressive'}) {
        printerr "Warning: local overrides global progressive alignment!\n"
    }
    push @locarna_params, "--local-file-output";
}


## no stacking for locarna_p
push @locarna_params, "--stacking" if ($opts->{'stacking'});
push @locarna_params_tree, "--stacking" if ($opts->{'stacking'});

push @locarna_params, "--new-stacking" if ($opts->{'new-stacking'});
push @locarna_params_tree, "--new-stacking" if ($opts->{'new-stacking'});

## pass alifold-consensus-dp
push @locarna_params, "--alifold-consensus-dp" if ($opts->{'alifold-consensus-dp'});
# push @locarna_params_tree, "--alifold-consensus-dp" if ($opts->{'alifold-consensus-dp'});


######## ------------------------------------------------------------
#### do some consistency checks on the parameters
##
#
#

if ($opts->{'consistency-transformation'} && !$opts->{'probabilistic'}) {
    printerr "ERROR: Consistency transformation is supported only in probabilistic mode. Exit.\n";
    exit -1;
}

if (!$opts->{'probabilistic'} && ($opts->{'write-probs'} || $opts->{'write-bm-probs'} || $opts->{'write-am-probs'})) {
    printerr "WARNING: Writing of probabilities only possible in probabilistic mode.\n";
}

if (($opts->{'fast-mea'} || $opts->{'only-basematch-probs'}) && $opts->{'write-am-probs'}) {
    printerr "WARNING: Writing of arc match probabilities requested, but their computation is disabled.\n";
}

if (defined($opts->{'rnafold-temperature'}) && $opts->{'rnafold-temperature'}!=37 && defined($opts->{'alifold-consensus-dp'})) {
    printerr "WARNING: alifold consensus dotplots are computed at 37°C\n";
    printerr "         (different to initial folding temperature T=$opts->{'rnafold-temperature'})\n\n";
}

if (defined($opts->{'rnafold-parameter'}) && defined($opts->{'alifold-consensus-dp'})) {
    printerr "WARNING: alifold consensus dotplots are computed with standard parameters\n\n";
}

### global variables for reference alignment
my $refaln;          # reference alignment
my $refaln_filename; # filename of reference alignment

##### ----------------------------------------
#### Writing of probabilities

## don't write probabilities to disk by default; turn off unless explicitly wanted
if (1 || defined($opts->{'no-write-probs'})) {
    $opts->{'write-bm-probs'} = 0 unless defined($opts->{'write-bm-probs'});
    $opts->{'write-am-probs'} = 0 unless defined($opts->{'write-am-probs'});;
}

# collectively turn on write-probs unless am or bm are turned off
if (defined($opts->{'write-probs'})) {
    $opts->{'write-bm-probs'} = 1 unless defined($opts->{'no-write-bm-probs'});
    $opts->{'write-am-probs'} = 1 unless defined($opts->{'no-write-am-probs'});
}

####
## get full, absolute path for parameter file
if (defined($opts->{'rnafold-parameter'})) {
    if ((!defined(abs_path($opts->{'rnafold-parameter'})) || (! -e $opts->{'rnafold-parameter'}))) {
        printerr "WARNING: RNAfold parameters $opts->{'rnafold-parameter'} not found (use default)!\n";
        $opts->{'rnafold-parameter'} = undef;
    } else {
        $opts->{'rnafold-parameter'} = abs_path($opts->{'rnafold-parameter'});
    }
}

####
## parameters for RNAfold and RNAalifold (parameter file,
## temperature, and maxBPspan)
my @RNAfold_args = ();
if (defined($opts->{'rnafold-parameter'})) {
    push @RNAfold_args, "-P" => $opts->{'rnafold-parameter'};
}

if (defined($opts->{'rnafold-temperature'})) {
    push @RNAfold_args, "-T" => $opts->{'rnafold-temperature'};
}

if (defined($opts->{'maxBPspan'}) && $opts->{'maxBPspan'} ne "-1") {
    push @RNAfold_args, "--maxBPspan" => $opts->{'maxBPspan'};
}

####
## check existence of pairwise aligner
## search in PATH, unless absolute or relative path is given
## get full, absolute path

if (defined($opts->{'pw-aligner'})) {
    $opts->{'pw-aligner'} = find_in_exec_path_or_error($opts->{'pw-aligner'});
}

####
## get full, absolute path for pairwise aligner p
if (defined($opts->{'pw-aligner-p'})) {
    $opts->{'pw-aligner-p'} = find_in_exec_path_or_error($opts->{'pw-aligner-p'});
}


#### ------------------------------------------------------------

if ($opts->{'probabilistic'} && ! $opts->{'fast-mea'}) {
    printmsg 2,"Locarna-P parameters (for computing probabilities): @locarna_p_params\n";
}
printmsg 2,"Locarna parameter for progressive alignment: @locarna_params\n";
printmsg 2,"Locarna parameter for guide tree construction: @locarna_params_tree\n";

##------------------------------------------------------------

use Sys::Hostname qw(hostname);
my $global_tmpname="_tmp-".hostname."-".$$."_"; # valid file name, at max 12 characters
my $global_tmpprefix="$global_tmpname"; # could be a absolute path, but there's no need to

### ============================================================================
### ============================================================================
### ============================================================================
## ---------------------------------------------------------------------------
# main program
#

## GLOBAL VARS (since passing this arround is ugly:))
my $bmprobs = {}; # base match probabilities
my $amprobs = {}; # arc match probabilities

## use for storing untransformed probs
## in case of consistency-transformation
my $bmprobs_nocbt = {}; # untransformed base match probabilities
my $amprobs_nocbt = {}; # untransformed arc match probabilities

# name normalizer
my $nnorm;

## CALL main
main();

exit(0);

sub main {

    # activate skip-pp when mlocarna is used for distributed computation
    if (defined($opts->{'compute-pairwise-scores'}) or defined($opts->{'score-lists'})) {
        $opts->{'skip-pp'} = 1;
    }

    if (defined($opts->{'compute-pairwise-scores'}) and ! -d $opts->{'tgtdir'}."/input" ) {
        printerr "ERROR: Option --compute-pairwise-scores requires precomputed\n"
                ."       dot plots in the target directory.\n"
                ."       Compute these data by mlocarna with --only-dps.\n";
        exit -1;
    }
    if (defined($opts->{'score-lists'})
            and (! -d $opts->{'tgtdir'}."/input"
                 or ! -d $opts->{'tgtdir'}."/scores")) {
        printerr "ERROR: Option --score-lists requires precomputed dot plots\n"
                ."       and score lists in the target directory.\n"
                ."       Please consult the documentation (e.g. mlocarna --man).\n";
        exit -1;
    }

    if (! -e $opts->{'tgtdir'}) {
        mkdir $opts->{'tgtdir'};
        if (! -e $opts->{'tgtdir'} ) {
            printerr "ERROR: Cannot create target directory $opts->{'tgtdir'}. Exit.";
            exit -1;
        }
    } else {
        if ( -d $opts->{'tgtdir'} ) {
            printmsg 2, "Warning: $opts->{'tgtdir'} exists already.\n"
                unless defined($opts->{'skip-pp'});
        } else {
            printerr "ERROR: ".$opts->{'tgtdir'}." is not a directory. Exit.\n";
            exit -1;
        }
    }

    my $seqs;            # input sequences

    ## ----------------------------------------
    # read input sequences from file
    #
    if ($opts->{'realign'}) {
        $refaln = read_clustalw_alnloh($opts->{'input-file'}, 1);
        $refaln_filename = abs_path($opts->{'input-file'});
        $seqs = project_sequences_for_realign($refaln);
    } else {
        $seqs = read_fasta($opts->{'input-file'}) ;
    }

    foreach my $seq (@{$seqs}) {
        if (not defined($seq->{seq}) or $seq->{seq} eq "") {
            printmsg 2,"WARNING: Sequence $seq->{name} is empty.\n";
            $seq->{seq}="";
        }
    }

    ##### ------------------------------------------------------------
    ## limited realignment: max_diff_aln

    if (defined($opts->{'max-diff-aln'})) {
        if ($opts->{'realign'}) {
            ## require '.' argument to max-diff-aln
            if (!($opts->{'max-diff-aln'} eq "." or $opts->{'max-diff-aln'} eq $opts->{'input-file'})) {
                printerr "WARNING: due to '--realign', the argument of max-diff-aln is ignored;\n";
                printerr "         to avoid this warning, set it to '.' (or identical to the input file).\n";
            }
        } else {
            $refaln_filename=abs_path($opts->{'max-diff-aln'});
            ## read alignment from file
            $refaln = read_clustalw_alnloh("$refaln_filename");
        }
    }

    ##### ------------------------------------------------------------

    ## register normalized names for them that can be used as filenames
    $nnorm = new MLocarna::NameNormalizer($seqs);

    ## $nnorm->print();


    ## check input: we need at least two input sequences
    if (@$seqs == 0) {
        printerr("ERROR: No input sequences found in $opts->{'input-file'}.\n");
        exit(-1);
    }
    if (@$seqs == 1) {
        printerr("ERROR: Only one input sequence found in $opts->{'input-file'}.\n");
        exit(-1);
    }

    if ( defined $opts->{'anchor-constraints'} ) {
        print "Read anchor constraints from file $opts->{'anchor-constraints'}...\n" if $opts->{'verbose'};
        my $regions = read_anchor_constraints($seqs,
                                              $opts->{'anchor-constraints'});
        if ($opts->{'verbose'}) {
            print "ANCHOR REGIONS: @{$regions}\n";
        }
    }

    ## make sure anchor and structure constraint annotation is valid,
    ## otherwise fail
    constraint_annotation_is_valid_or_die($seqs);

    ## check whether sequences and reference alignment are compatible
    if ($opts->{'max-diff-aln'} && (! is_aln_of_seqs($seqs,$refaln)) ) {
        print STDERR "Inconsistent input: Given reference alignment "
          ."( --max-diff-aln \"$opts->{'max-diff-aln'}\" ) "
          ."does not match input sequences ( \"$opts->{'input-file'}\" ).\n";
        exit(-1);
    }

    ## check compatibility of options
    if (defined($opts->{'probabilistic'})
        and (defined($opts->{'compute-pairwise-scores'}) or defined($opts->{'score-lists'}))) {
        printerr "ERROR: The precomputation of pairwise alignments using compute-pairwise-scores and score-lists "
            ."is not yet available in probabilistic alignment mode.";
        exit -1;
    }

    ## parse and error check argument of compute-pairwise-scores
    if (defined($opts->{'compute-pairwise-scores'})) {
        if ($opts->{'compute-pairwise-scores'} =~ /^(\d+)\/(\d+)$/) {
            my $k=$1;
            my $N=$2;

            if ($k<1 or $k>$N) {
                printerr "Argument parse error: For option --compute-pairwise-scores, k must between 1 and N.";
                exit(-1)
            }

            $opts->{'compute-pairwise-scores'} = {
            'k' => $k,
            'N' => $N,
           };
        } else {
            printerr "Argument parse error: --compute-pairwise-scores expects argument k/N";
            exit(-1)
        }
    }

    ## ----------------------------------------
    ## make target directory the current directory
    ## and make paths to later used files absolute
    ##
    $opts->{'tgtdir'} = abs_path($opts->{'tgtdir'});
    if (defined($opts->{'treefile'})) { $opts->{'treefile'} = abs_path($opts->{'treefile'}); }
    if (defined($opts->{'similarity-matrix'})) {
        $opts->{'similarity-matrix'} = abs_path($opts->{'similarity-matrix'}); }
    if (defined($opts->{'dp-cache'}))  { $opts->{'dp-cache'}  = abs_path($opts->{'dp-cache'});  }

    mkdir $opts->{'tgtdir'}."/$input_dir";

    if ($opts->{'probabilistic'} && ($opts->{'write-am-probs'} || $opts->{'write-bm-probs'}) ) {
        mkdir $opts->{'tgtdir'}."/$probs_dir";
    }

    mkdir $opts->{'tgtdir'}."/$intermediate_dir";

    mkdir $opts->{'tgtdir'}."/$results_dir";

    ## keep the input file in input_dir
    {
        my $inputsuffix = "fa";
        if ($opts->{'realign'}) {
            # in realignment, mode use suffix '.aln'
            $inputsuffix = 'aln';
        }
        copy($opts->{'input-file'}, $opts->{'tgtdir'}."/$input_dir/input.$inputsuffix");
    }

    ## ----------------------------------------
    # perform multiple alignment
    #
    perform_multiple_alignment($seqs);

    ## ----------------------------------------
    # compute reliabilities and print reliability plots
    #
    if ($opts->{'probabilistic'}) {

        ## iterate $opts->{'it-reliable-structure'} times using reliable structures as structure constraints
        for (my $it=0; $it<$opts->{'it-reliable-structure'}; $it++) {
            my $aln = read_aln_wo_anno($opts->{'tgtdir'}."/$results_dir/result.aln");

            my $reliable_structures = compute_and_write_reliabilities($aln);

            printmsg 3, "\nIterate alignment with structure constraints from reliability information.\n";

            ## add constraints for new round of iteration
            constrain_sequences_from_reliable_structures($seqs,$reliable_structures);

            perform_multiple_alignment($seqs);
        }

        my $aln = read_aln_wo_anno($opts->{'tgtdir'}."/$results_dir/result.aln");
        compute_and_write_reliabilities($aln);

    } ## end if probabilistic

    printmsg 3, "Results written to target directory $opts->{'tgtdir'}.\n"

} ## end main

sub scorelist_dirname {
    return $opts->{'tgtdir'}."/scores";
}

sub scorelist_files {
    opendir(my $dh, scorelist_dirname()) || return ();
    return map { scorelist_dirname().'/'.$_ }
        (grep /^scores/, readdir($dh));
}

sub scorelist_filename {
    my $k = shift;
    my $N = shift;
    return scorelist_dirname()."/scores-".$k;
}

## ----------------------------------------
## compute and print reliabilities
##
## @param %aln the multiple alignment
##
sub compute_and_write_reliabilities {
    my ($aln) = @_;

    my @names = keys %$aln;
    @names = grep {!/\#/} @names;

    my ($bmrels_seq, $bmrels_str, $amrels);

    if ($opts->{'consistency-transformation'}) {
        ($bmrels_seq, $bmrels_str, $amrels)
            = MLocarna::MatchProbs::compute_reliability($aln,$bmprobs_nocbt,$amprobs_nocbt,$nnorm);
    } else {
        ($bmrels_seq, $bmrels_str, $amrels)
            = MLocarna::MatchProbs::compute_reliability($aln,$bmprobs,$amprobs,$nnorm);
    }

    my $width = $opts->{'width'};
    my $namewidth = aln_namewidth(\@names);

    # compute and print a maximum reliability structure for the consensus reliabilities
    my @empty=();
    my ($score,$rel_str)=MLocarna::MatchProbs::max_weight_structure(aln_length($aln),\@empty,$amrels,1,0.125);
    printmsg 3, sprintf("\n%-".$namewidth."s %s\n","max. reliability",$rel_str);

    if (3 >= $MLocarna::Aux::verbosemode) {
        printmsg 3, "reliability\n";
        MLocarna::MatchProbs::write_reliability_bars($bmrels_seq, $bmrels_str,
                                                     $namewidth, $width);
    }

    MLocarna::MatchProbs::write_bm_reliabilities($opts->{'tgtdir'}."/$results_dir/result.bmreliability",
                                                 $bmrels_seq,$bmrels_str);
    MLocarna::MatchProbs::write_am_reliabilities($opts->{'tgtdir'}."/$results_dir/result.amreliability",$amrels);

    MLocarna::MatchProbs::write_dotplot($opts->{'tgtdir'}."/$results_dir/reldot.ps",consensus_sequence($aln),$amrels);

    if ($opts->{'consistency-transformation'}) {

        my ($bmrels_seq, $bmrels_str, $amrels)
            = MLocarna::MatchProbs::compute_reliability($aln,$bmprobs,$amprobs, $nnorm);

        if ($opts->{'verbose'}) {
            printmsg 3, "reliability (cbt)\n";
            if (3 >= $MLocarna::Aux::verbosemode) {
	        MLocarna::MatchProbs::write_reliability_bars($bmrels_seq, $bmrels_str,
                                                             $namewidth, $width);
            }
        }

        MLocarna::MatchProbs::write_bm_reliabilities($opts->{'tgtdir'}."/$results_dir/result.bmreliability-cbt",
                                                     $bmrels_seq,$bmrels_str);
        MLocarna::MatchProbs::write_am_reliabilities($opts->{'tgtdir'}."/$results_dir/result.amreliability-cbt",$amrels);

        MLocarna::MatchProbs::write_dotplot($opts->{'tgtdir'}."/$results_dir/reldot-cbt.ps",consensus_sequence($aln),$amrels);
    }

    ## hash for storing highly reliable structures (that will be used as constraints)
    my %reliable_structures;

    if ($reliabilities_single_sequences) {
        mkdir $opts->{'tgtdir'}."/$results_dir/$single_reliabilities_dir";
        ## compute reliability profiles for single sequences
        for my $name (sort {$a cmp $b} @names) {
            # printmsg 1,"Compute structure reliability profile for $name\n";

            ## compute and write reliability profiles for the single sequences
            my ($bmrels_seq,$bmrels_str) =
	        MLocarna::MatchProbs::compute_bmreliabilities_single_seq($name,$aln,$bmprobs_nocbt,$amprobs_nocbt,$nnorm);

            my $nname = $nnorm->nname($name);
            MLocarna::MatchProbs::write_bm_reliabilities(
                $opts->{'tgtdir'}."/$results_dir/$single_reliabilities_dir/$nname.bmreliability",
	        $bmrels_seq, $bmrels_str);


            ## compute and write arc match reliabilities for the single sequences
            my %name_pairprobs = read_pp_file_pairprobs($opts->{'tgtdir'}."/$input_dir/$nname");

            my $amrels = MLocarna::MatchProbs::compute_amreliabilities_single_seq(
                $name,$aln,$amprobs_nocbt,\%name_pairprobs,$nnorm);
            MLocarna::MatchProbs::write_am_reliabilities(
                $opts->{'tgtdir'}."/$results_dir/$single_reliabilities_dir/$nname.amreliability", $amrels);
            MLocarna::MatchProbs::write_dotplot(
                $opts->{'tgtdir'}."/$results_dir/$single_reliabilities_dir/$nname"."_reldot.ps", $aln->{$name},$amrels);

            my @empty=();
            my ($score,$rel_str)=MLocarna::MatchProbs::max_weight_structure(length($aln->{$name}),\@empty,$amrels,1,0.125);

            printmsg 3,sprintf("%-".$namewidth."s ",$name).$rel_str."\n";

            $rel_str = project_structure_to_alignment_sequence($rel_str,
                                                               $aln->{$name},
                                                               "-~.",
                                                               "({`",
                                                               ")}'",
                                                               "."
                                                              );
            $reliable_structures{$name} = $rel_str;
        }
    }

    return \%reliable_structures;
}

## chop string into parts of fixed length
##
## @param $s string
## @param $w length (<=0 means don't split)
## @return list of substrings
##
sub chop_string {
    my ($s,$w) = @_;

    if ($w<=0) {
        return ($s);
    }

    my @l=();
    while (length($s)>0) {
        push @l,substr($s,0,$w);
        if (length($s)>$w) {
            $s=substr($s,$w);
        } else {
            last;
        }
    }
    return @l;
}

## perform the complete multiple alignment for the given sequences
##
## performs the single steps:
## * generate target dir
## * compute of base pair probabilities
## * in probabilistic mode: compute alignment edge probabilities
## * optionally: consistency-transformation
## * compute all-2-all alignments
## * compute guide tree
## * perform progressive steps
## * optionally perform iterative refinement
##
##
## @param $seqs reference to fasta list of hashs
##
## GLOBAL VARS: amprobs, bmprobs, amprobs_nocbt, bmprobs_nocbt (since we need this later)
##
sub perform_multiple_alignment {
    my $seqs = shift;

    ## ----------------------------------------
    ## get names of sequences in seqs
    #
    my @names = map { $_->{name} } @$seqs;

    my $current_dir = abs_path(".");

    chdir $opts->{'tgtdir'};

    if ($opts->{'threads'}>1) {
       printmsg 2,"Compute using $opts->{'threads'} threads.\n";
    }

    ## ----------------------------------------
    # compute probability dot-plots
    # (if wanted or if necessary)
    #
    if ($opts->{'opt_no_bpp_precomputation'}) {
        printmsg 3,"Skip precomputation of dot plots.\n";
        generate_input_files_without_bpps($seqs);
    } else {
        printmsg 2,"Compute pair probs ...\n";
        compute_all_dotplots($seqs,$opts->{'threads'});
    }

    ## exit here if only dps shall be computed
    if ($opts->{'only-dps'}) {
        printmsg 3,"Computed only the dotplots (results written to $opts->{'tgtdir'}).\n";
        exit 0;
    }


    ## ----------------------------------------
    ## generate base match probabilities for all sequence pairs
    #  (if required)

    if ($opts->{'probabilistic'}) {

        printmsg 2,"Compute match probabilities ...\n";

        # compute the base match probabilites
        # either use full partition function locarna_p
        # or use the Gotoh PF implementation in locarna (in fast-mea mode)
        # compute arc match probs only if required
        #
        #
        compute_all_match_probs(\@names, $bmprobs_nocbt, $amprobs_nocbt, $opts->{'threads'});


        if ($opts->{'write-bm-probs'}) {
            MLocarna::MatchProbs::write_bm_probs("$probs_dir/bmprobs",$bmprobs_nocbt);
        }
        if ($opts->{'write-am-probs'}) {
            MLocarna::MatchProbs::write_am_probs("$probs_dir/amprobs",$amprobs_nocbt);
        }

        # print_k_dim_hash($bmprobs,4,"");

        if ($opts->{'consistency-transformation'}) {
            ## store untransformed probabilities

            printmsg 2, "Consistency transform match probabilities ...\n";

            $bmprobs = MLocarna::MatchProbs::consistency_transform_bm($bmprobs_nocbt,\@names, $nnorm);
            $amprobs = MLocarna::MatchProbs::consistency_transform_am($amprobs_nocbt,\@names, $nnorm);

            if ($opts->{'write-bm-probs'}) {
	        MLocarna::MatchProbs::write_bm_probs("$probs_dir/bmprobs-cbt",$bmprobs);
            }
            if ($opts->{'write-am-probs'}) {
	        MLocarna::MatchProbs::write_am_probs("$probs_dir/amprobs-cbt",$amprobs);
            }
        } else {
            $bmprobs=$bmprobs_nocbt;
            $amprobs=$amprobs_nocbt;
        }

        #print_k_dim_hash($bmprobs,4,"");
    }

    ## ----------------------------------------
    # generate guide tree (if not given)
    #

    my $tree;
    my $score_matrix;

    if (defined($opts->{'treefile'})) {

        printmsg 3,"Read guide tree from file. \n";

        open(my $TREE,"<","$opts->{'treefile'}");
        my $newick_string;
        while(<$TREE>) {$newick_string.=$_;}
        close $TREE;
        $tree = new MLocarna::Tree( $newick_string );
        $tree = $tree->binarize();
    } elsif (defined($opts->{'similarity-matrix'})) {

        $score_matrix = read_2D_matrix($opts->{'similarity-matrix'},int(@names),int(@names));

    } elsif (defined($opts->{'score-lists'})) {
        my $N = $opts->{'score-lists'};
        $score_matrix=[];

        my $total = @names * (@names - 1) / 2;

        my $duplicates = 0;
        foreach my $filename ( scorelist_files() ) {
            my $m = MLocarna::SparseMatrix::read_2D($filename);
            $duplicates = MLocarna::SparseMatrix::to_sym_array_check_2D(
                $m, $score_matrix, $duplicates);
        }
        if ($duplicates) {
            printmsg 3, "\nWarning: duplicates in score lists. Results may be COMPROMISED. Make sure to start from a clean target directory.\n\n";
        }

        # set diagonal to zero
        for (my $a=0; $a<@names; $a++) {
            $score_matrix->[$a][$a] = 0;
        }

        #check that all matrix elements are defined
        for (my $a=0; $a<@names; $a++) {
            for (my $b=0; $b<@names; $b++) {
                if (!defined($score_matrix->[$a][$b])) {
                    printerr("ERROR: Missing score for sequence pair ($a,$b) in score lists. Something went wrong in the computation of the pairwise scores.");
                    exit(-1);
                }
            }
        }
    } elsif (@names == 2) { # for only two sequences, tree is unique
        $tree = new MLocarna::Tree(
            [ new MLocarna::Tree( "LEAVE", $names[0] ),
              new MLocarna::Tree( "LEAVE", $names[1] )
            ]
        );
    } elsif ($opts->{'graphkernel'}) {
        ## RECALL: here the current directory is the target directory

        my @cmd=($opts->{'fasta2shrep'}, $opts->{'fasta2shrep-options'},
                 "-fasta" => "input/input.fa", "-tmp" => "graphkernel/tmp", "-o" => "graphkernel");
        print "Run @cmd\n";

        if (-d "graphkernel") {
            print "  Remove directory graphkernel\n";
            File::Path::rmtree "graphkernel";
        }

        mkdir "graphkernel";
        mkdir "graphkernel/tmp";

        ## Run fasta2shrep; writes output directory
        system(@cmd)==0 || die "Command @cmd failed: $!";

        File::Path::rmtree "graphkernel/tmp";

        ## write results to .gspan file
        system "bzcat graphkernel/*.bz2 > graphkernel/gspan";
        system "rm graphkernel/*.bz2";

        # note: the gspan format defines graphs
        @cmd = ("$opts->{'svmsgdnspdk'}", "-i" => "graphkernel/gspan",
                "-r" => "$opts->{'svmsgdnspdk-radius'}", "-d" => "$opts->{'svmsgdnspdk-distance'}",
                "-gt" => "DIRECTED", "-a" => "MATRIX");
        print "Run @cmd\n";

        system(@cmd)==0 || die "Command @cmd failed: $!";

        # system "bzip2 graphkernel/gspan"; ## keep the gspan file but compress it
        # OR
        unlink "graphkernel/gspan";

        ## name of the generated matrix file
        my $matrix_filename="graphkernel/gspan.mtx";

        ## read matrix file
        $score_matrix = read_2D_matrix($matrix_filename,int(@names),int(@names));

        ## convert it
        for my $i (0..@$score_matrix-1) {
            my $row=$score_matrix->[$i];
            for my $j (0..@$row-1) {
	        $score_matrix->[$i][$j] *= 10000;
            }
        }
    } elsif (defined($opts->{'compute-pairwise-scores'})) {
        # --------------------------------------------------
        # Compute a fraction of the scores for the guide tree and write to score list k/N
        #
        my $k = $opts->{'compute-pairwise-scores'}{'k'};
        my $N = $opts->{'compute-pairwise-scores'}{'N'};

        my $total = @names * (@names - 1) / 2;
        if ($k > $total) {
            printmsg 2,"Compute pairwise alignments, skip empty fraction $k of $N ... \n";
            exit 0;
        }

        printmsg 2,"Compute pairwise alignments, fraction $k of $N ... \n";

        my $score_hash = compute_all_pairwise_alignments(\@names, $bmprobs, $amprobs, $opts->{'threads'}, $k, $N);

        if ( ! -d scorelist_dirname() ) {
            mkdir scorelist_dirname();
        }
        MLocarna::SparseMatrix::write_2D($score_hash, scorelist_filename($k,$N));

        printmsg 2, "... finish after partial precomputation, as requested by --compute-pairwise-scores.\n";
        exit 0;
    } else {
        # --------------------------------------------------
        # compute score matrix for guide tree
        #

        printmsg 2,"Compute pairwise alignments ... \n";

        my $score_hash = compute_all_pairwise_alignments(\@names, $bmprobs, $amprobs, $opts->{'threads'});
        $score_matrix = MLocarna::SparseMatrix::to_sym_array_2D($score_hash);

        # set diagonal to zero
        for (my $a=0; $a<@names; $a++) {
            $score_matrix->[$a][$a] = 0;
        }

	# this did never really work. Here is the removed man page text:
	# =item  B<--extlib>
        # Use library extension for base pair probabilities (experimental/not functional).
	#
	# To reactivate it, we need a matrix of the pairwise aligner output
	#
	# if ($opts->{'extlib'}) {
        #     extend_library($pairwise_alns);
        # }
    }

    if (defined($score_matrix)) {
        ## write score matrix to the results directory
        write_2D_matrix("$results_dir/result.matrix",$score_matrix);
    }
    if (!defined($tree)) {
        if (!defined($score_matrix)) {
            printerr "ERROR: Don't know how to produce guide tree.";
            exit -1;
        }
        # compute tree from the score matrix
        $tree = new MLocarna::Tree("UPGMA",\@names,$score_matrix);
    }
    # end generation of tree


    open(my $TREE,">","$results_dir/result.tree");
    print $TREE $tree->to_newick().";\n";
    close $TREE;

    if (defined($opts->{'treefile'})) {
        ## check whether tree contains all names in the input file and project tree to these names

        if (! $tree->contains_leaves(\@names) ) {
            print STDERR "ERROR: given guide tree (--treefile) does not contain all names in the input.\n";
            exit(-1);
        }
        #print "Tree: ".$tree->to_newick().";\n";

        $tree = $tree->project(\@names);

        #print "Projected tree: ".$tree->to_newick().";\n";
    }

    ## ----------------------------------------
    # progressive alignment along the guide tree
    #
    printmsg 2, "Perform progressive alignment ...\n";

    my $intermediate_name_data = {}; # a hash of already assigned intermediary names
    my $current_alignment_name =
      perform_progressive_steps($tree,$bmprobs,$amprobs,$opts->{'max-alignment-size'},$intermediate_name_data);

    if (! defined $current_alignment_name) {
        if (defined $opts->{'max-alignment-size'}) {
            printmsg 3, "Only 'intermediary' alignments up to size $opts->{'max-alignment-size'} were computed.\n";
        } else {
            printmsg 3, "No alignment of all input sequences was computed.\n";
        }
        chdir "$current_dir";
        return;
    }

    ## ----------------------------------------
    ## copy results to files result_prog.aln and result_prog.pp
    ##
    copy( "$current_alignment_name.aln", "$results_dir/result_prog.aln" );
    copy( "$current_alignment_name.pp", "$results_dir/result_prog.pp" );
    if ($opts->{'stockholm'}) {
        copy( "$current_alignment_name.stk", "$results_dir/result_prog.stk" );
    }

    ### ============================================================
    ### iterative refinement

    ##
    ## 1.) from the guide tree get all bi-partitionings of seq-names,
    ##     that are separated by tree edges
    ## 2.) start with the final multiple alignment from the progressive phase
    ##     as the current multiple alignment
    ## 3.) for each partitioning
    ##     a) extract the multiple alignments of both partitions out of the current ma
    ##     b) generate new pp files with newly computed consensus pairprobs
    ##     c) align both paritions in order to form a new, better ma
    ##
    ##
    ## Details:
    ## ad 3a) read the current ma and generate the two sub-alignments
    ##        by projection, delete only-gap columns
    ## ad 3b) read all pp files of the single sequences in the sub-alignment
    ##        and compute consensus dot-plot by averaging probs
    ##        (geometric or arithmetic mean).
    ##        This requires two compute the projection of the single sequences
    ##        to their lines in the alignment.
    ##
    ## Handling of constraints?
    ## also: merge constraint lines from the single pp files
    ##


    if (defined($opts->{'iterate'}) && ! defined($opts->{'iterations'})) {
        $opts->{'iterations'}=1;
    }

    if (defined($opts->{'iterations'}) && ($#names==1)) {
        $opts->{'iterations'} = 0;
    }

    if (defined($opts->{'iterations'}) && ($opts->{'iterations'}>0)) {
         printmsg 2, "Perform iterative refinement ...\n";

        $current_alignment_name =
            perform_iterative_refinement($current_alignment_name,$tree,\@names,$bmprobs,$amprobs,$intermediate_name_data);
    }

    ### DONE iterative refinement
    ### ============================================================


    ## ----------------------------------------
    # copy results to files result.aln and result.pp
    # (even for non-iterative alignment the result-files are written only here)
    #
    copy( "$current_alignment_name.aln", "$results_dir/result.aln" );
    copy( "$current_alignment_name.pp", "$results_dir/result.pp" );
    if ($opts->{'stockholm'}) {
        copy( "$current_alignment_name.stk", "$results_dir/result.stk" );
    }

    ########################################
    ## rewrite the result alignment result.aln: sort
    ##
    my $aln = read_clustalw_alnloh("$results_dir/result.aln");

    if ($opts->{'keep-sequence-order'}) {
        $aln = loh_sort($aln,loh_names $seqs);
    }

    open(my $ALNOUT, ">", "$results_dir/result.aln");

    my $namewidth=write_clustalw_alnloh($ALNOUT,$aln,$opts->{'width'});
    close $ALNOUT;

    ## ----------
    ## write the result alignment to standard output
    open(my $alnfh, "<", "$results_dir/result.aln");
    while (<$alnfh>) { if (!/^CLUSTAL/) { printmsg 3, $_; } }
    close $alnfh;

    ## ----------------------------------------
    ## compute and show alifold prediction

    my $alistr = alifold_structure("$results_dir/result.aln", @RNAfold_args);
    chomp $alistr;

    if ($alistr eq "") {
        printmsg 3, "\nWARNING: cannot report consensus structure (RNAalifold call failed).\n";
    }

    if ($alistr=~/([^\s]*)\s(.*)/) {
        my $name="alifold";
        for my $s (chop_string($1,$opts->{'width'})) {
            printmsg 3, sprintf("\n%-".$namewidth."s %s",$name,$s);
            $name="";
        }
        printmsg 3, " ".$2."\n";
    }

    # in probabilistic mode compute and store alifold dotplot
    if ($opts->{'probabilistic'}) {
        alifold_pf("$results_dir/result.aln", @RNAfold_args);
        rename "alidot.ps", "$results_dir/alidot.ps";
    }

    rename "alirna.ps", "$results_dir/alirna.ps";
    rename "aln.ps", "$results_dir/aln.ps";

    chdir $current_dir;
}


### ============================================================================
### SUBS
###


## compute base and arc match probs for all pairs of names
## we assume that for each name there is a pp file
##
## @param $names list of all sequence names
## @param $bmprobs hash of base match probabilities for all sequence pairs
## @param $amprobs hash of arc match probabilties for all sequence pairs
## @thread_num number of parallel threads
##
## in fast-mea mode:
## for computing the probabilities, locarna is called with
## option --write-match-probs
##
## in pf mode
## for computing the probabilities, locarna is called with
## option --write-basematch-probs and --write-arcmatch-probs
##
## writes results to the hashs bmprobs and amprobs
sub compute_all_match_probs {
    my ($names, $bmprobs, $amprobs, $thread_num) = @_;

    if (!defined($thread_num)) { $thread_num = 1; }

    my @arg_lists_par;
    my @arg_lists_seq;

    my $a=0;

    ## collect argument lists
    my $num=0;
    for my $nameA (@$names) {
        $a++;
        my $b=0;
        for my $nameB (@$names) {
            $b++;
            if ($b!=$a) {
	        if ($a>$b) {
	            $num++;
	            push @arg_lists_par, [ ($num,$a,$b,$nameA,$nameB) ];
	        }
            }
        }
    }

    share($bmprobs);
    share($amprobs);

    my $num_names = scalar(@$names);

    ## perform in parallel
    foreach_par(
        sub {
            my ($num,$a,$b,$nameA,$nameB)=@_;
            printmsg 2, sprintf(
	        "PFAlign %d/%d : $nameA and %d/%d: $nameB (%d/%d)\n",
	        $a,$num_names,$b,$a-1,
	        $num,($num_names*($num_names-1))/2);

            compute_match_probs($a,$b,$nameA,$nameB,$bmprobs,$amprobs);
            get_symmetric_match_probs($b,$a,$nameB,$nameA,$bmprobs,$amprobs);
        },
        \@arg_lists_par,
        $thread_num);
}

sub get_symmetric_match_probs {
    # get match probs entries from the symmetric case
    my ($a,$b,$nameA,$nameB,$bmprobs,$amprobs) = @_;

    if (! exists $bmprobs->{$nnorm->nnamepair($nameB,$nameA)}) {
        printerr "Internal ERROR: symmetric match probabilities don't exist yet.\n";
        exit -1;
    }

    $bmprobs->{$nnorm->nnamepair($nameA,$nameB)} = threads::shared::shared_clone(
        MLocarna::SparseMatrix::transpose_2D( $bmprobs->{$nnorm->nnamepair($nameB,$nameA)} ) );

    $amprobs->{$nnorm->nnamepair($nameA,$nameB)} = threads::shared::shared_clone(
        MLocarna::SparseMatrix::transpose_4D( $amprobs->{$nnorm->nnamepair($nameB,$nameA)} ) );
}

## compute the match probabilities from a single pairwise alignment
## @param $nameA name of the first sequence
## @param $nameB name of the second sequence
## @param[out] $bmprobs hash of all base match probs (for all pairwise alignments)
## @param[out] $amprobs hash of all arc match probs (for all pairwise alignments)
##
## calls binary (locarna_p, or locarna if fastmea) to compute and write probabilities to files. The files are read again and
## probabilities are written into the hashs $bmprobs, $amprobs
##
sub compute_match_probs {
    # compute the match probabilities for the pair
    my ($a,$b,$nameA,$nameB,$bmprobs,$amprobs) = @_;

    my $tmpprefix = threadsafe_name("$global_tmpprefix");

    my $tmpfile_bm="$tmpprefix.bmps";
    my $tmpfile_am="$tmpprefix.amps";

    my @ref_aln_option = pairwise_reference_alignment_option($nameA,$nameB,$refaln,$refaln_filename);

    my $nnameA=$nnorm->nname($nameA);
    my $nnameB=$nnorm->nname($nameB);

    my @cmd = ();
    if ($opts->{'fast-mea'}) {
        push @cmd, $opts->{'pw-aligner'}, "$input_dir/$nnameA", "$input_dir/$nnameB",
          @locarna_params, "--write-match-probs" => "$tmpfile_bm";
    } else {
        push @cmd, $opts->{'pw-aligner-p'}, "$input_dir/$nnameA", "$input_dir/$nnameB",
          @locarna_p_params, "--write-basematch-probs" => "$tmpfile_bm";

        if ( ! $opts->{'only-basematch-probs'} ) {
        push @cmd, "--write-arcmatch-probs" => "$tmpfile_am";
        }
    }

    push @cmd, @ref_aln_option; ## add option for reference alignment (potentially empty)

    push @cmd, "-q" unless $opts->{'moreverbose'};

    printmsg 1, "@cmd\n";
    system(@cmd)==0 || die_hard( "Cannot compute match probabilities by @cmd: $!" );

    if ((! -e $tmpfile_bm) || (! -e $tmpfile_am)) {
        die_hard "Pairwise aligner failed to write match probabilities to file; called by @cmd.";
    }

    $bmprobs->{$nnorm->nnamepair($nameA,$nameB)} = threads::shared::shared_clone(
        MLocarna::SparseMatrix::read_2D($tmpfile_bm));
    $amprobs->{$nnorm->nnamepair($nameA,$nameB)} = threads::shared::shared_clone(
        MLocarna::SparseMatrix::read_4D($tmpfile_am));
    unlink "$tmpfile_am";
    unlink "$tmpfile_bm";
}

## calls pairwise aligner ($opts->{'pw-aligner'}) on nameA and nameB in
## input_dir and writes to $tgt.pp and $tgt.aln
sub call_locarna {
    my ($inA,$inB,$tgt) = @_;

    printmsg 2, "Align $inA + $inB --> $tgt\n";

    my @cmd = ();
    push @cmd, $opts->{'pw-aligner'}, $inA, $inB, @locarna_params, "--clustal=$tgt.aln";

    if ($opts->{'stockholm'}) {
        push @cmd, "--stockholm=$tgt.stk";
    }

    push @cmd, "--pp=$tgt.pp";

    push @cmd, reference_alignment_option($refaln_filename);

    push @cmd, "-q" unless $opts->{'verbose'} or $opts->{'moreverbose'};

    printmsg 1, "@cmd\n\n";
    system(@cmd)==0 || die "Command @cmd failed: $!";

    if ((! -e "$tgt.aln") || (! -e "$tgt.pp")) {
        die "Pairwise aligner ($opts->{'pw-aligner'}) failed to write alignment to file";
    }

    ## print the output alignment
    #if ($opts->{'verbose'} and not $opts->{'moreverbose'}) {
    #    open(my $alnfh, "<", "$tgt.aln");
    #    while(<$alnfh>) { if (!/^CLUSTAL/) { print $_; } }
    #    close $alnfh;
    #    print "\n";
    #}
}


## call locarna for aligning alignments in pp files
## pass averaged base and (if required) arc match probs to locarna
## (unless in fast-mea mode, the pair probabilities in the pp files are ignored)
##
sub call_locarna_prob {
    my ($inA,$inB,$tgt,$bmprobs,$amprobs) = @_;

    ## extract the alignments from the pp files
    #
    my %alnA = read_pp_file_aln_wo_anno($inA);
    my %alnB = read_pp_file_aln_wo_anno($inB);

    my $tmpprefix = threadsafe_name("$global_tmpprefix");

    # average the base match probabilities and write to tmpfile
    my $m = MLocarna::MatchProbs::average_basematch_probs( \%alnA, \%alnB, $bmprobs, $nnorm );
    my $tmpfile_bm="$tmpprefix.bmps";
    MLocarna::SparseMatrix::write_2D($m,$tmpfile_bm);
    my @readprob_args = ("--read-match-probs" => $tmpfile_bm);

    my $tmpfile_am="$tmpprefix.amps";
    # if required, average the arc match probabilities and write to tmpfile
    if (!$opts->{'fast-mea'} && !$opts->{'only-basematch-probs'}) {
        my $m = MLocarna::MatchProbs::average_arcmatch_probs( \%alnA, \%alnB, $amprobs, $nnorm );
        MLocarna::SparseMatrix::write_4D($m,$tmpfile_am);

        push @readprob_args, "--read-arcmatch-probs" => $tmpfile_am;
    }

    printmsg 2, "Align $inA + $inB --> $tgt\n";
    my @cmd = ();
    push @cmd, $opts->{'pw-aligner'}, $inA, $inB, @locarna_params,
      "--pp=$tgt.pp", "--clustal=$tgt.aln", @readprob_args;

    if ($opts->{'stockholm'}) {
        push @cmd, "--stockholm=$tgt.stk";
    }

    push @cmd, reference_alignment_option($refaln_filename);

    push @cmd, "-q" unless $opts->{'moreverbose'};

    printmsg 1, "@cmd\n\n";
    system(@cmd)==0 || die "Command @cmd failed: $!";

    if ((! -e "$tgt.aln") || (! -e "$tgt.pp")) {
        die "Pairwise aligner ($opts->{'pw-aligner'}) failed to write alignment to file";
    }

    if ($opts->{'verbose'} and not $opts->{'moreverbose'}) {
        open(my $alnfh, "<", "$tgt.aln");
        while(<$alnfh>) { if (!/^CLUSTAL/) { print $_; } }
        close $alnfh;
        print "\n";
    }

    if (!$opts->{'fast-mea'} && !$opts->{'only-basematch-probs'}) {
        unlink $tmpfile_am;
    }
    unlink $tmpfile_bm;
}

########################################
# write pp file out of the alignment projected to
# the given names
# generates consensus dotplot and constraints
# out of single pp files
sub write_pp_projected {
    my ($aln, $names, $tgt_name) = @_;

    my $alnP = project_aln($aln,$names);

    ## compute the consensus dot plot
    my @consensus_dp; # 2D array
    my @size; # number of added values for pos $i,$j

    foreach my $name (@$names) {
        my @posmap = project_seq($alnP->{$name});
        my %name_aln =
            read_pp_file_aln_w_anno("$input_dir/".$nnorm->nname($name));
        my %name_pairprobs =
            read_pp_file_pairprobs("$input_dir/".$nnorm->nname($name));

        if ( exists $name_aln{"#C"} && (!$opts->{'ignore-constraints'})) {
            # my $constraints = $name_aln{"#C"};
            printerr "WARNING: constraints ignored in iterative alignment.\n";
        }

        # for computing the geometric mean, we don't allow 0 values,
        # since this extinguishs the probability completely!
        # therfore we use some very small 'background' probability
        # if $p is too low.
        #
        # In principle, copy the magic spell from alignment.cc
        #
        my $minimal_p = $opts->{'min-prob'}*0.1;
        if (defined($opts->{'exp-prob'}) && $opts->{'exp-prob'}<$minimal_p ) {
            $minimal_p = $opts->{'exp-prob'};
        }

        for my $pair (keys %name_pairprobs) {
            $pair =~ /(\d+) (\d+)/;
            my $i=$1;
            my $j=$2;
            my $p = $name_pairprobs{$pair};

            if ($p<$minimal_p) {$p=$minimal_p;}

            $consensus_dp[$posmap[$i]][$posmap[$j]] += log($p);
            $size[$posmap[$i]][$posmap[$j]] ++;
        }
    }


    # normalize consensus_dp
    my $len  = aln_length( $alnP );

    for (my $i=1; $i<=$len; $i++) {
        for (my $j=$i+1; $j<=$len; $j++) {
            if (defined($consensus_dp[$i][$j])) {
	        $consensus_dp[$i][$j] = exp($consensus_dp[$i][$j] / $size[$i][$j]);
            }
        }
    }

    write_pp($tgt_name,$alnP,\@consensus_dp,$opts->{'min-prob'});

    return;
}



sub extend_library {
    my ($pairwise_alns,$names) = @_;

    ### ------------------------------------------------------------
    ## compute "extended pairprobability library"
    #

    my @pairprobs;
    my @sequences;

    for (my $a=0; $a<@$names; $a++) {
        my $seqA;
        my $pairprobsA_ref;
        ($seqA,$pairprobsA_ref) = read_dp_ps("$names->[$a]");

        $sequences[$a] = $seqA;
        $pairprobs[$a] = [ @{ $pairprobsA_ref } ];
    }

    printmsg 3, "Compute extended pairprobability library ...\n";

    ## make a deep copy of pair probabilities
    #
    my @ext_pairprobs;
    for (my $a=0; $a<@$names; $a++) {
        my @eppA;
        my @ppA = @{ $pairprobs[$a] };
        for my $i ( 0..$#ppA ) {
            if (defined $ppA[$i]) {
	        my @ppAa = @{ $ppA[$i] };
	        for my $j ( 0..$#ppAa ) {
	            if (defined $ppAa[$j]) {
		        $eppA[$i][$j] = $ppA[$i][$j];
	            }
	        }
            }
        }
        $ext_pairprobs[$a] = [ @eppA ];
    }

    for (my $a=0; $a<@$names; $a++) {
        printmsg 1, "Compute ext lib for $names->[$a]\n";

        my @ext_pairprobsA = @{ $ext_pairprobs[$a] }; # we accummulate the extension!

        for (my $b=0; $b<@$names; $b++) {
            if ($a != $b) {

	        my $aliA; # alignment-string for sequence a
	        my $aliB; # alignment-string for sequence b

	        if ($a>$b) {
	            my $alnAB = $pairwise_alns->[$a][$b];
	            ($aliA,$aliB) = extract_from_clustal_alignment($names->[$a],$names->[$b],$alnAB);
	        } else {
	            my $alnBA = $pairwise_alns->[$b][$a];
	            ($aliB,$aliA) = extract_from_clustal_alignment($names->[$b],$names->[$a],$alnBA);
	        }

	        my @sp2apA = seqpos_to_alipos($aliA);
	        my @sp2apB = seqpos_to_alipos($aliB);

	        my @ap2spA = alipos_to_seqpos($aliA);
	        my @ap2spB = alipos_to_seqpos($aliB);

	        my $lengthA=$#sp2apA+1;
	        my $lengthB=$#sp2apB+1;

	        printmsg 1, "  $a:$names->[$a] ($lengthA) --- $b:$names->[$b]\n";

	        my @pairprobsB = @{ $pairprobs[$b] };

	        for (my $iA=0; $iA<$lengthA; $iA++) {
	            for (my $jA=$iA+1; $jA<$lengthA; $jA++) {
		        my $pA = (defined $ext_pairprobsA[$iA][$jA])?$ext_pairprobsA[$iA][$jA]:$opts->{'min-prob'};
		        if ($pA > $opts->{'min-prob'}) {
		            my $iB = $ap2spB[$sp2apA[$iA]];
		            my $jB = $ap2spB[$sp2apA[$jA]];

		            if ($iB != -1 && $jB != -1) {
			        # print "$iA,$jA <--> $sp2apA[$iA],$sp2apA[$jA] <--> $iB,$jB\n";
			        my $pB = (defined $pairprobsB[$iB][$jB])?$pairprobsB[$iB][$jB]:$opts->{'min-prob'};
			        if ($pB>$opts->{'min-prob'}) {
			            my $new_pA = exp( log($pA) * (1/(1+$pB)));
			            $ext_pairprobsA[$iA][$jA] = $new_pA;
			            my @orig_pairprobsA=@{ $pairprobs[$a] };
			            my $orig_pA  = $orig_pairprobsA[$iA][$jA];
			            printmsg 1, "Enhance ($iA,$jA):$pA by ($iA,$jB):$pB --> $new_pA/$orig_pA\n";
			        }
		            }
		        }
	            }
	        }
            }
        }
    }

    ## --------------------------------------------------
    ## write pp files with extended probabilities
    #

    for (my $a=0; $a<@$names; $a++) {
        # print "write extended probs for $names->[$a]\n";

        my $seq=$sequences[$a];
        my $len=length($seq);
        my @ext_pairprobs = @{ $ext_pairprobs[$a] }; # the extended probs


        open(my $PP_OUT, ">", "$names->[$a].pp");

        print $PP_OUT "$names->[$a] $seq\n\n#\n";

        for (my $i=0; $i<$len; $i++) {
            for (my $j=$i+1; $j<$len; $j++) {
	        if (defined $ext_pairprobs[$i][$j]) {
	            my $p = $ext_pairprobs[$i][$j];
	            if ($p > $opts->{'min-prob'}) {
		        print $PP_OUT "$i $j $p\n";
	            }
	        }
            }
        }
        close $PP_OUT;
    }

    #
    # print "\n\n";
    #

    for (my $a=0; $a<@$names; $a++) {
        rename "$names->[$a].pp", "$names->[$a]";
    }

    #
    ##
    ## --------------------------------------------------
}


## compute the alignment from dp-files or pp-files
sub compute_alignment_from_dps {
    my ($input_dir,$nameA,$nameB,$locarna_params) =  @_;

    my $tmpprefix = threadsafe_name("$global_tmpprefix");

    my $tmpfile="$tmpprefix.clustal";

    my $seqA_dp="$input_dir/".$nnorm->nname($nameA);
    my $seqB_dp="$input_dir/".$nnorm->nname($nameB);

    my @cmd = ();
    push @cmd, $opts->{'pw-aligner'}, $seqA_dp, $seqB_dp,
      @$locarna_params,
      "--clustal" => $tmpfile,
      pairwise_reference_alignment_option($nameA,$nameB,$refaln,$refaln_filename);

    push @cmd, "-q" unless $opts->{'moreverbose'};

    system(@cmd)==0 || die_hard "Command @cmd failed: $!";

    open(my $CA_IN, "<", "$tmpfile") ||
        die_hard
        "\nPairwise aligner failed to write alignment ($tmpfile.)\n  Failed call:\n    @cmd\n";

    my @content=<$CA_IN>;

    close $CA_IN;

    unlink $tmpfile;

    return @content;
}

## compute alignment with given probabilities
sub compute_alignment_from_dps_probs {
    my ($input_dir,$nameA,$nameB,$locarna_params,$bmprobs,$amprobs) =  @_;

    # printerr "compute_alignment_from_dps_probs $seqA_dp $seqB_dp";

    my $tmpprefix = threadsafe_name("$global_tmpprefix");

    my $tmpfile  = "$tmpprefix.clustal";
    my $tmpfile2 = "$tmpprefix.bmprobs";
    my $tmpfile3 = "$tmpprefix.amprobs";

    my $seqA_dp="$input_dir/".$nnorm->nname($nameA);
    my $seqB_dp="$input_dir/".$nnorm->nname($nameB);

    MLocarna::SparseMatrix::write_2D($bmprobs,$tmpfile2);

    my @locarna_params = ();
    push @locarna_params, @$locarna_params;

    push @locarna_params, "--clustal=$tmpfile", "--read-match-probs=$tmpfile2";

    if (!$opts->{'fast-mea'} && !$opts->{'only-basematch-probs'}) {
        MLocarna::SparseMatrix::write_4D($amprobs,$tmpfile3);

        push @locarna_params, "--read-arcmatch-probs" => $tmpfile3;
    }

    my @cmd = ();
    push @cmd, $opts->{'pw-aligner'}, $seqA_dp, $seqB_dp, @locarna_params;

    push @cmd, "-q" unless $opts->{'moreverbose'};

    push @cmd, pairwise_reference_alignment_option($nameA,$nameB,$refaln,$refaln_filename);

    system(@cmd)==0 || die_hard "Command @cmd failed: $!";

    open(my $CA_IN, "<", "$tmpfile") || die_hard "compute_alignment_from_dps_probs: Cannot read temporary file $tmpfile: $!";

    if (!$opts->{'fast-mea'} && !$opts->{'only-basematch-probs'}) {
        unlink $tmpfile3;
    }
    unlink $tmpfile2;

    my @content=<$CA_IN>;

    close $CA_IN;

    unlink $tmpfile;

    return @content;
}

## ----------------------------------------
## compute the dotplots of all sequences
## of given names in a given sequences hash
##
## write to files in the current directory
## (filenames equal the sequence names)
##
## ------------------------------
## Dependency on GLOBAL VARIABLES
##
## $opts->{'dp-cache'}   if defined, use as cache directory
##             in order to avoid multiple computation of bp probs
## $opts->{'skip-pp'}    if true, skip computation of bp-probs for RNAs
##             where the corresponding file already exists in
##             the current directory
## $opts->{'rnafold-parameter'}
## $opts->{'rnafold-temperature'}
##
## $opts->{'verbose'}
## $opts->{'plfold-span'}
## $opts->{'plfold-winsize'}
##
sub compute_all_dotplots {
    my ($seqs,$thread_num) = @_;

    if (!defined($thread_num)) { $thread_num = 1; }

    my @arg_list;
    foreach my $i (0..@$seqs-1) {
        push @arg_list, [ ($i) ];
    }

    foreach_par(sub { my ($i)=@_; compute_dotplot($i,$seqs) },
            \@arg_list,
            $thread_num);
}

## ----------------------------------------
## generate input files without base pair probabilties
sub generate_input_files_without_bpps {
    my ($seqs) = @_;
    for my $i (0..@$seqs-1) {
        my $name = $seqs->[$i]->{name}; # the name of the sequence
        my $ppname="$input_dir/".$nnorm->nname($name);
        my $seq_str = $seqs->[$i]->{seq}; ## the sequence string

        if (! $opts->{'ignore-constraints'}) {
            my $constraints = anchor_constraint_string($seqs->[$i]);
            if (defined($constraints) && ($constraints ne "")) {
	        printerr( "WARNING: Constraints found for sequence $i, "
                  ."but not supported without precomputation of base pair probabilities\n.");
            }
        }

        ## write a simple fasta file with the input file name
        open(my $fh, ">", "$ppname");

        print $fh ">$name\n";
        print $fh "$seq_str\n";

        close ($fh);
    }
}

############################################################
## compute dotplot using locarna_rnafold_pp
##
## @param $ppfile output file name
## @param $seq sequence record
##
## @result pp file "$ppfile"
sub compute_dotplot_rnafold_pp {
    my ($ppfile, $seq) = @_;

    my @fold_cmd = ( "$bindir/locarna_rnafold_pp", "-p" => $opts->{'min-prob'} );

    if ($opts->{'noLP'}) {
        push @fold_cmd, "--noLP";
    }

    if ($opts->{'in-loop-probabilities'}) {
        push @fold_cmd, "--in-loop";
    }
    if ( $opts->{'stacking'} || $opts->{'new-stacking'} ) {
        push @fold_cmd, "--stacking";
    }
    if (defined($opts->{'maxBPspan'}) && $opts->{'maxBPspan'} ne "-1") {
        push @fold_cmd, "--maxBPspan" => $opts->{'maxBPspan'};
    }

    push @fold_cmd, "-o" => "$ppfile";

    my $seqname = $seq->{name};
    my $seq_str = $seq->{seq}; ## the sequence string

    my $input = "$seqname $seq_str\n";

    if ( not $opts->{'ignore-constraints'} ) {
        my $structure_constraints = $seq->{"ANNO#S"};
        if ( defined($structure_constraints) ) {
            $input = $input."\#S $structure_constraints\n";
            push @fold_cmd, "-C";
        }

        my $anchor_constraints = anchor_constraint_string($seq);
        if ( $anchor_constraints ne "" ) {
            my $i=0;
            $input .= join("\n", map {$i++;"#A$i ".$_;} split('#', "$anchor_constraints"));
        }
    }

    system_pipein_list($input, @fold_cmd);
}

########################################
## convert dp file to pp file and delete RNA(pl)fold files
##
## @param dpname base name of dot plot files
## @param ppfile name of output pp file
## @param $seq sequence record
##
sub convert_and_move_dpfile_to_pp {
    my ($dpname,$ppfile,$seq) = @_;

    my @args = ($ppfile);
    push @args, $seq->{name};
    push @args, $seq->{seq};
    push @args, anchor_constraint_string($seq);

    if ( -e "$dpname\_dp2.ps" ) {
        convert_dp_to_pp_with_constraints("$dpname\_dp2.ps", @args, 1);
    } else {
        convert_dp_to_pp_with_constraints("$dpname\_dp.ps", @args, 0);
    }

    unlink "$dpname\_ss.ps" || printerr "Cannot delete $dpname\_ss.ps\n";
    unlink "$dpname\_dp.ps" || printerr "Cannot delete $dpname\_dp.ps\n";
    if ( -e "$dpname\_dp2.ps" ) {
        unlink "$dpname\_dp2.ps" || printerr "Cannot delete $dpname\_dp2.ps\n";
    }
}


############################################################
## compute dotplot using RNAfold
##
## @param $ppfile output file name
## @param $seq sequence record
##
## @result pp file "$ppfile"
sub compute_dotplot_rnafold {
    my ($ppfile,$seq) = @_;

    my $tmpname = $nnorm->nname($seq->{name});

    my @fold_cmd = ("$RNAfold", "-p2");
    push  @fold_cmd, @RNAfold_args;

    if ($opts->{'noLP'}) {
        push @fold_cmd, '--noLP';
    }

    my $seq_str = $seq->{seq}; ## the sequence string

    my $input = ">$tmpname\n$seq_str\n";

    my $structure_constraints = $seq->{"ANNO#S"};
    if ( not $opts->{'ignore-constraints'}
         and defined($structure_constraints) ) {
        $input = $input."$structure_constraints\n";
        push @fold_cmd, "-C";
    }

    system_pipein_list($input,@fold_cmd);

    convert_and_move_dpfile_to_pp($tmpname, $ppfile, $seq)
}

############################################################
## compute dotplot using RNAplfold
##
## @param $ppfile output file name
## @param $seq sequence record
##
## ignores structure constraints
sub compute_dotplot_rnaplfold {
    my ($ppfile,$seq) = @_;

    my $tmpname = $nnorm->nname($seq->{name});

    my $seq_str = $seq->{seq}; ## the sequence string

    ##
    ## fix -W and -L parameters to avoid warnings of
    ## RNAplfold if window size is too large for sequence.
    ##
    my @fold_cmd = ();

    push @fold_cmd,
      "RNAplfold",
      "-L" => $opts->{'plfold-span'},
      "-W" => $opts->{'plfold-winsize'}, @RNAfold_args;

    if ($opts->{'noLP'}) {
        push @fold_cmd, '--noLP';
    }

    my $slen=length($seq_str);
    my $win = $slen;
    for (my $i=0; $i < @fold_cmd; $i++) {
        if ($fold_cmd[$i] eq "-W") {
            $i++;
            $win=$fold_cmd[$i];
            if ($win>$slen) {
                $fold_cmd[$i] = $slen;
                $win=$slen;
            }
        }
    }
    for (my $i=0; $i < @fold_cmd; $i++) {
        if ($fold_cmd[$i] eq "-L") {
            $i++;
            if ($fold_cmd[$i] > $win) {
                $fold_cmd[$i] = $win;
            }
        }
    }

    my $input = ">$tmpname\n$seq_str\n";
    system_pipein_list($input, @fold_cmd);

    convert_and_move_dpfile_to_pp($tmpname, $ppfile, $seq)
}

############################################################
## compute_dotplot(i,seqs)
## compute probability dotplot for the $ith sequence in $seqs (using RNAfold/RNAplfold/locarna_rnafold_pp)
##
sub compute_dotplot {
    ## register delegates here
    my %delegate_compute_dotplot =
      (
       "locarna_rnafold_pp" => \&compute_dotplot_rnafold_pp,
       "RNAfold" => \&compute_dotplot_rnafold,
       "RNAplfold" => \&compute_dotplot_rnaplfold,
      );

    my ($i,$seqs)=@_;

    ## the ith sequence
    my $seq = $seqs->[$i];

    # the (real) name of the sequence
    my $seqname = $seq->{name};

    my $seq_str = $seq->{seq};

    ## the file system conformant name of the sequence
    ## @note this name is used as base for files in the cache!
    my $seqfilename=$seq->{name};
    $seqfilename =~ s/[^a-zA-Z0-9]/_/g; # replace special characters by "_"

    ## the file system conformant normalized name of the sequence
    ## @note this name is used as base for files in the result directory!
    my $seqnormname=$nnorm->nname($seqname);

    ## name of the resulting dot plot file in pp format
    my $ppfile="$input_dir/".$seqnormname;

    my $anchor_constraints;
    if (! $opts->{'ignore-constraints'}) {
        $anchor_constraints = anchor_constraint_string($seqs->[$i]);
    }

    ## ------------------------------
    ## SKIPPING
    ## decide, whether to fold at all (or skip folding)
    ##
    ## Implement optional skipping of input dot plot files:
    ## if skip_pp is given and one of the files $ppname, $ppname\_dp2.ps, or $ppname\_dp.ps
    ## is available, no new dot plot is computed.
    ## Test in the above order, so $ppname overrides $ppname\_dp2.ps overrides or $ppname\_dp.ps
    ##
    ## The latter two cases allow using anchor constraints given in the input mfasta file
    ##
    if ( $opts->{'skip-pp'} ) {
        if ( -e "$ppfile" )  {
            # assume that the pp file is in pp format
            print "Skip $seqname\n" if $opts->{'moreverbose'};
            # skip
        } elsif ( -e "$ppfile\_dp2.ps" ) {
            # assume that the file is in dp2 format. Therefore, convert to pp.
            print "Skip $seqname, convert from $ppfile\_dp2.ps\n"  if $opts->{'moreverbose'};
            convert_dp_to_pp_with_constraints("$ppfile\_dp2.ps", $ppfile,
                                              $seqname, $seq_str,
                                              $anchor_constraints, 1);
        } elsif ( -e "$ppfile\_dp.ps" ) {
            # assume that the file is in dp format. Therefore, convert to pp.
            print "Skip $seqname, convert from $ppfile\_dp.ps\n"  if $opts->{'moreverbose'};
            convert_dp_to_pp_with_constraints("$ppfile\_dp.ps", $ppfile,
                                              $seqname, $seq_str,
                                              $anchor_constraints, 0);
        }
        return;
    }

    ## ------------------------------
    ## CACHING
    ## if caching is enabled, use cached dot plots
    if ( defined( $opts->{'dp-cache'} ) ) {
        my $cached=0; ## whether dot plot could be retrieved from cache

        my $dpname="$opts->{'dp-cache'}/$seqfilename";

        if ( -e "$dpname" ) {
            copy($dpname, $ppfile);
            $cached=1;
        } else {
            # does the file contain conditional probabilities (stacking)
            my $has_cond_probs = 0;

            if ( -e $dpname."_dp2.ps" ) {
                $cached = 1;
                $has_cond_probs = 1;
                $dpname = $dpname."_dp2.ps";
            } elsif ( -e $dpname."_dp.ps" ) {
                $cached = 1;
                $dpname = $dpname."_dp.ps";
            }

            if ($cached) {
                convert_dp_to_pp_with_constraints($dpname,$ppfile,$seqname,
                                                  $seq_str,$anchor_constraints,
                                                  $has_cond_probs);
            }
        }

        if ($cached) {
            return;
        }
    }


    ## ------------------------------
    ## FIXED STRUCTURE
    ## @note fixed structure #FS overrides structure #S
    # case: fixed structure constraints are given
    if ((!$opts->{'ignore-constraints'})
        && exists $seqs->[$i]->{"ANNO#FS"}) {
        if ($opts->{'verbose'}) {
            print "Fixed structure given for sequence "
              .$seqs->[$i]->{'name'};
            print " (ignore local folding)" if ($opts->{'plfold-span'});
            print ".\n";
        }
        my $structure = $seqs->[$i]->{"ANNO#FS"};
        convert_fix_structure_to_pp($ppfile,$seqname,$seq_str,
                                    $structure,$anchor_constraints);
        return;
    }
    ## ------------------------------


    ## NOTE: only from this point, we actually perform some folding


    ## ------------------------------
    ## select folding tool, based on requested vs supported features;
    ## preferably use rnafold_pp
    ##
    my $folding_tool = "locarna_rnafold_pp";

    my $local_folding_requested = defined($opts->{'plfold-span'});

    ## check whether rnafold_pp cannot be used for any reason
    my $rnafold_pp_applicable =
      @RNAfold_args == 0
      && ! $local_folding_requested ;

    my $structure_constraints_given =
      exists $seqs->[$i]->{"ANNO#FS"}
      || exists $seqs->[$i]->{"ANNO#S"};

    if (! $rnafold_pp_applicable) {
        if ($opts->{'in-loop-probabilities'}) {
            print STDERR "Warning: cannot precompute in-loop probabilities.\n";
        }

        if ($local_folding_requested
            and not exists $seqs->[$i]->{"ANNO#S"}
           ) {
            $folding_tool = "RNAplfold";
        } else {
            $folding_tool = "RNAfold";
        }
    }
    ## ------------------------------

    ## warn about if in loop probabilities are requested but cannot be
    ## computed
    if (! $rnafold_pp_applicable && $opts->{'in-loop-probabilities'}) {
        print STDERR "Warning: cannot precompute in-loop probabilities.\n";
    }

    ## if verbose, check whether local folding is active; warn about
    ## override of constraints over local folding
    if ($opts->{'verbose'}
        and $local_folding_requested
        and exists $seq->{"ANNO#S"}) {
        printerr "WARNING: structure constraints override local folding.\n";
    }

    # delegate to folding tool specific method
    if (not exists $delegate_compute_dotplot{$folding_tool}) {
        printerr "Internal ERROR: unknown folding tool $folding_tool.\n";
        exit -1;
    }
    $delegate_compute_dotplot{$folding_tool}($ppfile, $seq);

    ## CACHING: cache the result, if requested
    if (defined($opts->{'dp-cache'})) {
        if ( -e "$ppfile" ) {
            copy("$ppfile", "$opts->{'dp-cache'}/$seqfilename");
        }
    }
}

########################################
## extract_score_from_alignment
##
## get the alignment score from the output of locarna (or a similar pairwise aligner)
##
## @param $alntext text output of locarna
##
## @returns $score
##
## replaces -inf scores by large negative values
##
########################################
sub extract_score_from_alignment {
    my $aln = shift;
    $aln =~ /Score: (\S+)/ || die "Cannot extract score from pairwise alignment output:\n$aln\n";
    my $score=$1;

    ## replace -inf scores by something strongly negative (that
    ## does not immediately overflow); apparently, too negative values confuse tree construction
    if ($score eq "-inf") {$score=-1e8;}

    return $score;
}


## ----------------------------------------
## compute all pairwise alignments for pairs
## of sequences (given by @names)
##
## @returns ref to 2D-array of alignments (indices are
##          positions in list @names)
##
## @depends $opts->{'probabilistic'}   flag for probabilistic alignment
##
## @param $names       names of sequences
## @param $bmprobs     base match probs in case of probabilistic alignment
## @param $amprobs     arc  match probs in case of probabilistic alignment
## @param $thread_num         number of threads (default 1)
## @param[optional] $k compute only k-th fraction of $N
## @param[optional] $N total number of fractions
##
## @return hash of scores s at indices {a}{b},where a and b are input sequence
## indices and s is the score of aligning the sequences a and b
sub compute_all_pairwise_alignments {
    my ($names,$bmprobs,$amprobs,$thread_num,$k,$N) = @_;

    if (!defined($thread_num)) { $thread_num=1; }

    my @argument_lists;

    my $scores : shared;
    $scores = share({});

    # calculate number of pairwise alignments
    my $total = @$names * (@$names - 1) / 2;
    if (defined $k) {
        if ($k > $total) {
            return undef;
        }
        if ($N > $total) {
            $N = $total;
        }
    }

    ## if $k and $N are defined, compute only the alignments, where
    ## $num is in the range
    ## [ floor(($k-1)*$total/$N), floor($k*$total/$N) [

    ## collect arguments for function calls (=jobs)
    my $num=0;
    my $idx=0;
    for (my $a=0; $a<@$names; $a++) {
        for (my $b=0; $b<$a; $b++) {
            $idx++;

            if (defined $k) {
                if ( $idx-1 < int( ($k-1) * $total / $N ) ) {
                    next;
                }
                if ( $idx-1 >= int( $k * $total / $N ) ) {
                    last;
                }
            }
            $num++;

            if ( $opts->{'probabilistic'} ) {
	        my @arg_list = ($num,$a,$b,
			        $names->[$a],
			        $names->[$b],
			        $bmprobs->{$nnorm->nnamepair($names->[$a],$names->[$b])},
			        $amprobs->{$nnorm->nnamepair($names->[$a],$names->[$b])});
	        push @argument_lists, [ @arg_list ];
            } else {
	        my @arg_list=($num,$a,$b,
		              $names->[$a],
		              $names->[$b]
	            );
	        push @argument_lists, [ @arg_list ];
            }
        }
    }

    ## perform jobs in parallel
    foreach_par(
        sub {
            my ($num,$a,$b,@arg_list)=@_;

            printmsg 2, sprintf(
	        "Align %d/%d : $names->[$a] and %d/%d: $names->[$b] (%d/%d)\n",
	        $a+1,scalar(@$names),$b+1,$a,$num,scalar(@argument_lists));

            my @aln;

            if ( $opts->{'probabilistic'} ) {
	        my ($nameA,$nameB,$bmprobs,$amprobs) = @arg_list;
	        @aln = compute_alignment_from_dps_probs($input_dir,
						        $nameA,
						        $nameB,
						        \@locarna_params_tree,
						        $bmprobs,
						        $amprobs);
            } else {
	        my ($nameA,$nameB) = @arg_list;
	        @aln = compute_alignment_from_dps($input_dir,
					          $nameA,
					          $nameB,
					          \@locarna_params_tree
					          );
            }

            if (!exists $scores->{$a}) {
                $scores->{$a} = share({});
            }
            $scores->{$a}{$b} = extract_score_from_alignment("@aln");

        },
        \@argument_lists,
        $thread_num
        );


    return $scores;
}


## ----------------------------------------
## perform the progressive
## to compute the multiple alignment
##
## @param $tree the guide tree
## @param $bmprobs base match probabilities
## @param $amprobs arc match probabilities
##
## @pre the guide tree must be binary
## @pre needs input files
##
sub perform_progressive_steps {
    my $tree = shift;
    my $bmprobs = shift;
    my $amprobs = shift;
    my $max_alignment_size = shift;
    my $intermediate_name_data = shift;

    my $result =
    $tree->traverse(sub {
        my $tree = shift;
        my $children_data = shift;
        my $children = $tree->children();

        for my $child (@$children_data) {
            return undef unless defined $child;
        }

        if (@$children>=2) {
            if (@$children>2) {
                print "ERROR: the guide tree must be binary\n.";
                exit(-1);
            }

            my $size = $children_data->[0]->{size} + $children_data->[1]->{size};

            if (defined $max_alignment_size and $size>$max_alignment_size) {return undef;}

            my $intermediate_name = new_intermediate_name($intermediate_name_data,$tree->label());

            my $op1 = $children_data->[0]->{name};
            my $op2 = $children_data->[1]->{name};

            if ( $opts->{'probabilistic'} ) {
	        call_locarna_prob($op1,$op2,"$intermediate_dir/".$intermediate_name,$bmprobs,$amprobs);
            } else {
	        call_locarna($op1,$op2,"$intermediate_dir/".$intermediate_name);
            }

            return {
                name => "$intermediate_dir/$intermediate_name.pp",
                size => $size
            };
        } elsif (@$children==0) {
            return { name => "$input_dir/".$nnorm->nname($tree->label()), size => 1 };
        }
    });

    if (defined $result) {
        $result = $result->{name};
        $result =~ s/.pp$//;
    }
    return $result;
}


## ----------------------------------------
## Perform iterative refinement
##
## @returns file name of result
##
sub perform_iterative_refinement {
    my ($current_alignment_name,$tree,$names,$bmprobs,$amprobs,$intermediate_name_data) = @_;

    ## my $current_aliquality = alifold_mfe("$current_alignment_name.aln", @RNAfold_args); ## optimize SCI

    my $current_aliquality;
    if ($opts->{'probabilistic'}) {
        $current_aliquality = MLocarna::MatchProbs::aln_reliability_fromfile(
            "$current_alignment_name.aln",$bmprobs,$amprobs, $nnorm); ## optimize RELIABILITY
    } else {
        $current_aliquality = - alifold_mfe("$current_alignment_name.aln", @RNAfold_args); ## optimize SCI
    }

    for (my $i=1; $i<=$opts->{'iterations'}; $i++) {
        printmsg 1, "--- Iterative Refinement ---\n";
        printmsg 1, "--- Round $i\n";

        my $round_aliquality;
        ($current_alignment_name,$round_aliquality) =
            perform_iterative_refinement_one_round($i,
                $current_alignment_name,$current_aliquality,
                $tree,$names,$bmprobs,$amprobs,
                $intermediate_name_data
            );

        if ($current_aliquality == $round_aliquality) {
            printmsg 2, "Converge after round $i\n\n";
            last;
        }
        $current_aliquality=$round_aliquality;
    }
    return $current_alignment_name;
}

## ----------------------------------------
## Perform one round of iterative refinement
##
## @returns file name of result and alignment quality
##
sub perform_iterative_refinement_one_round {
    my ($round,
        $current_alignment_name,$current_aliquality,
        $tree,$names,$bmprobs,$amprobs,$intermediate_name_data) = @_;

    my $round_aliquality = $current_aliquality;

    $tree->traverse_partitions(sub {
        my $namesA = shift;
        my $namesB = subtract_list( $names, $namesA );

        printmsg 2, "Realign split @$namesA --- @$namesB\n";

        my $aln = read_aln_wo_anno("$current_alignment_name.aln");

        my $intermediate_name = "$intermediate_dir/".new_intermediate_name($intermediate_name_data,"-round$round",1);

        my $intermediate_nameA = "$intermediate_name.A.pp";
        my $intermediate_nameB = "$intermediate_name.B.pp";
        write_pp_projected($aln,$namesA,$intermediate_nameA);
        write_pp_projected($aln,$namesB,$intermediate_nameB);


        if ( $opts->{'probabilistic'} ) {
            call_locarna_prob($intermediate_nameA,$intermediate_nameB,$intermediate_name,$bmprobs,$amprobs);
        } else {
            call_locarna($intermediate_nameA,$intermediate_nameB,$intermediate_name);
        }

        ## ----------------------------------------
        # test, whether new alignment is better than former
        #
        my $new_aliquality;# = alifold_mfe("$intermediate_name.aln", @RNAfold_args);
        if ($opts->{'probabilistic'}) {
            $new_aliquality = MLocarna::MatchProbs::aln_reliability_fromfile(
                "$intermediate_name.aln",$bmprobs,$amprobs, $nnorm); ## optimize RELIABILITY
        } else {
            $new_aliquality = - alifold_mfe("$intermediate_name.aln", @RNAfold_args); ## optimize SCI
        }
        printmsg 2, sprintf "Quality: %5.2f / %5.2f\n",$new_aliquality,$current_aliquality;

        if ($new_aliquality > $current_aliquality) {
            printmsg 0, "Accept new alignment (improves alifold-mfe: $current_aliquality ==> $new_aliquality)\n";
            $current_aliquality = $new_aliquality;
            $current_alignment_name = $intermediate_name;
        } else {
            printmsg 0, "Reject new alignment (worse or equal alifold-mfe: $current_aliquality <= $new_aliquality)\n";
        }
        printmsg 0, "\n";
    });

    return ($current_alignment_name,$current_aliquality);
}

## EVALUATION MODE
## to keep this simple, assume that probabilities
## are already computed and given in tgtdir
sub run_evaluation_mode {
    my ($alnfile,$probsdir)=@_;

    ## TODO: this should be changed for use of alnloh representation
    ## such that sequence name order is preserved
    my $aln;

    if ( $opts->{'eval-fasta'} ) {
         my $alnlist = read_fasta($alnfile);
        my %alnhash=();
        foreach my $seq (@$alnlist) {
            $alnhash{$seq->{name}} = $seq->{seq};
        }
        $aln = \%alnhash;
    } else {
        $aln = read_aln_wo_anno($alnfile);
    }

     ## register normalized names that can be used as filenames
    for my $name (keys %$aln) {
        $nnorm->register_name($name);
    }

    my $bmprobs = MLocarna::MatchProbs::read_bm_probs("$probsdir/bmprobs");
    my $amprobs = MLocarna::MatchProbs::read_am_probs("$probsdir/amprobs");

    write_aln($aln);
    printmsg 3, "\n";


    my $cheap_score = MLocarna::MatchProbs::aln_reliability($aln,$bmprobs,$amprobs, $nnorm);

    my ($score,$rel_str) = MLocarna::MatchProbs::evaluate_alignment($aln,$bmprobs,$amprobs,1);

    printmsg 3, "RELIABILITY 1/COL   ".sprintf("%5.2f",$cheap_score)."\%\n";
    printmsg 3, "RELIABILITY 2/COL  ".sprintf("%5.2f",$score)."\%\n";
    printmsg 3, "MAX REL. STRUCT.   $rel_str\n";

    my $len = aln_length($aln);
    my $corrected_len=aln_length_atleastonematch($aln);

    my $corrected_score=$score*$len/$corrected_len;
    my $corrected_cheap_score=$cheap_score*$len/$corrected_len;
    #
    printmsg 3, "RELIABILITY 1/CCOL  ".sprintf("%5.2f",$corrected_cheap_score)."\%\n";
    printmsg 3, "RELIABILITY 2/CCOL  ".sprintf("%5.2f",$corrected_score)."\%\n";


}

########################################
## pairwise_reference_alignment_option($nameA,$nameB,$aln)
##
## generate option for pairwise reference alignment of sequences
## with names $nameA and $nameB in the multiple alignment $aln
##
## current implementation returns option for passing the whole
## multiple alignment to the locarna executable. Instead, one could
## pass only the pairwise alignment, avoiding some IO.
##
## @pre $aln contains sequences $nameA and $nameB
##
## @param $nameA name of sequence A
## @param $nameB name of sequence B
## @param $refaln reference alignment
## @param $refaln_filename name of file with reference alignment
##
## @returns option list or () if $aln undefined
sub pairwise_reference_alignment_option {
    my ($nameA,$nameB,$refaln,$refaln_filename) = @_;
    return reference_alignment_option($refaln_filename);
}

########################################
## reference_alignment_option($aln)
##
## return option for passing the alignment $aln as reference alignment
## to locarna
##
## @param $refaln_filename name of file with reference alignment
##
## @returns option list or () if $refaln_filename undefined
sub reference_alignment_option {
    my ($refaln_filename) =@_;
    if ( $opts->{'max-diff-aln'} ) {
        return ("--max-diff-aln" => "$refaln_filename");
    } else {
        return ();
    }
}

########################################
## is_aln_of_seqs($seqs,$reference_aln)
##
## check whether $reference_aln contains an alignment of the sequences $seq
##
## @param $seqs list of sequences with names (as returned by read_fasta)
## @param $reference_aln reference alignment in loh (as returned by read_aln_wo_anno)
##
## @returns boolean whether $reference_aln contains an alignment of the sequences $seq
##
sub is_aln_of_seqs {
    my ($seqs,$aln) = @_;

    foreach my $seq ( @$seqs ) {
        my $found=0;
        foreach my $alnseq ( @$aln ) {
            if ($seq->{name} eq $alnseq->{name}) {

	        my $alnseqstr = $alnseq->{seq};
	        $alnseqstr =~ tr/~.-//d;
	        $alnseqstr = uc $alnseqstr;
	        $alnseqstr =~ s/T/U/g;

	        my $seqstr=$seq->{seq};
	        $seqstr = uc $seqstr;
	        $seqstr =~ s/T/U/g;

	        $found = ( $alnseqstr eq $seqstr );

	        if ($opts->{'verbose'} && !$found) {
	            print STDERR "Warning: name match ($seq->{name}) but sequence mismatch.\n";
	            print STDERR "  Sequences: $seqstr\n";
                    print STDERR "  Alignment: $alnseqstr\n";
	        }

	        last;
            }
        }
        if (!$found) {
            return 0;
        }
    }
    return 1;
}

########################################
## project_sequences_for_realign($aln)
##
## @param $refaln reference alignment (list of hash alignment)
##
## @returns sequences (reference to list of hash representation)
##
## process consensus constraint annotation; convert to single sequence constraints
##
## @note '-', '.', and '~' are considered gap symbols
##
sub project_sequences_for_realign {
    my $aln = shift;

    my $gap_symbols="-.~";

    my @seqs=();

    my $anchors;
    my $structure;
    my $fixed_structure;

    for my $entry (@$aln) {
        if ( $entry->{name} eq "#A1" ) {
            $anchors = $entry->{seq};
        } elsif ( $entry->{name} eq "#S" ) {
            $structure = $entry->{seq};
        }
        elsif ( $entry->{name} eq "#FS" ) {
            $fixed_structure = $entry->{seq};
        }
        elsif ( $entry->{name} =~ /^#A\d/ ) {
            ##ignore
        }
        elsif ( $entry->{name} =~ /^#/ ) {
            printerr "WARNING: project_sequences_for_realign: unsupported annotation $entry->{name}.\n";
        }
    }

    my @anchor_idx;
    my $anchor_width=0;

    my @consensus_anchors=();

    if ($anchors) {
        my $idx=0;
        for (my $i=0; $i<length($anchors); $i++) {
            if (substr($anchors,$i,1) !~ /[$gap_symbols]/) {
                $anchor_idx[$i]=$idx;
                $idx++;
            } else {
                $anchor_idx[$i]=undef;
            }
        }
        if ($idx>0) {
            $anchor_width = length(sprintf("%d",$idx-1));
        }

        for (my $i=0; $i < $anchor_width; $i++) {
            push @consensus_anchors, "." x length($anchors);
        }

        ## generate the anchor constraint strings
        my @anchor_lines=();
        if ($anchors) {
            for (my $i=0; $i<length($anchors); $i++) {
                if (defined($anchor_idx[$i])) {
                    my @id = split //, sprintf("%0${anchor_width}d",$anchor_idx[$i]);
                    for (my $j=0; $j<$anchor_width; $j++) {
                        substr($consensus_anchors[$j],$i,1)
                          = $id[$j];
                    }
                }
            }
        }
    }

    for my $entry (@$aln) {
        next if ( $entry->{name} =~ /^#/ ); # skip annotations
        my $seqs_entry = { name=>$entry->{name}, descr=>"", seq=>$entry->{seq}};

        $seqs_entry->{seq} =~ s/[$gap_symbols]//g;

        if ($anchors) {
            # generate ANNO#i entries
            for (my $i=0; $i < $anchor_width; $i++) {
                $seqs_entry->{"ANNO#".($i+1)} =
                  project_string_to_alignment_sequence($consensus_anchors[$i],$entry->{seq},$gap_symbols);
            }
        }
        if ($structure) {
            $seqs_entry->{"ANNO#S"} =
              project_structure_to_alignment_sequence($structure,
                                                      $entry->{seq},
                                                      $gap_symbols,
                                                      "(",")",".");
        }
        if ($fixed_structure) {
            $seqs_entry->{"ANNO#FS"} =
              project_structure_to_alignment_sequence($fixed_structure,
                                                      $entry->{seq},
                                                      $gap_symbols,
                                                      "(",")",".");
        }

        push @seqs, $seqs_entry;
    }
    return \@seqs;
}

### ============================================================================
### ============================================================================
