Pattern recognition in computational molecular biology : techniques and approaches / edited by Mourad Elloumi, Costas S. Iliopoulos, Jason T. L. Wang, Albert Y. Zomaya. -- Hoboken, New Jersey : John Wiley & Sons Inc., c2016. –(58.178056 /P316) |
Contents
LIST OF CONTRIBUTORS
PREFACE
I PATTERN
RECOGNITION IN SEQUENCES
1 COMBINATORIAL
HAPLOTYPING PROBLEMS
1.1
Introduction / 3
1.2 Single
Individual Haplotyping / 5
1.3
Population Haplotyping / 12
References
2 ALGORITHMIC
PERSPECTIVES OF THE STRING BARCODING PROBLEMS
2.1
Introduction / 28
2.2 Summary
of Algorithmic Complexity Results for Barcoding Problems / 32
2.3 Entropy-Based
Information Content Technique for Designing Approximation Algorithms for String
Barcoding Problems / 34
2.4
Techniques for Proving Inapproximability Results for String Barcoding
Problems / 36
2.5
Heuristic Algorithms for String Barcoding Problems / 39
2.6
Conclusion / 40
Acknowledgments / 41
References / 41
3
ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON
3.1 Introduction
/ 43
3.2
Whole-Genome Sequence Analysis / 44
3.3
Underlying Approach / 47
3.4
Experimental Results / 54
3.5
Conclusion / 61
Author's Contributions / 62
Acknowledgments / 62
References / 62
4 A MAXIMUM
LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT
4.1
Introduction / 65
4.2 Multiple
Sequence Local Alignment / 67
4.3 Motif
Finding Algorithms / 70
4.4 Time
Complexity / 75
4.5 Case
Studies / 75
4.6
Conclusion / 80
References / 81
5 GLOBAL
SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS
5.1
Introduction / 83
5.2
Definitions and Notation / 85
5.3 Problem
Definition / 87
5.4
Algorithms / 88
5.5
Conclusion / 94
References / 95
II PATTERN
RECOGNITION IN SECONDARY STRUCTURES
6 A SHORT
REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS
6.1
Introduction / 99
6.2
Representative Protein Secondary Structure Prediction Methods / 102
6.3
Evaluation of Protein Secondary Structure Prediction Methods / 106
6.4
Conclusion / 110
Acknowledgments / 110
References / 111
7 A GENERIC
APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS: APPLICATION TO PROTEIN
SECONDARY STRUCTURE PREDICTION
7.1
Introduction / 114
7.2
Biological Sequence Segmentation / 115
7.3 MSVMpred
/ 117
7.4 Postprocessing
with A Generative Model / 119
7.5
Dedication to Protein Secondary Structure Prediction / 120
7.6
Conclusions and Ongoing Research / 125
Acknowledgments / 126
References / 126
8 STRUCTURAL
MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH
8.1
Introduction / 129
8.2 A Few
Basic Concepts / 130
8.3 State of
the Art / 135
8.4 A Novel
Geometrical Approach to Motif Retrieval / 138
8.5 Implementation
Notes / 149
8.6 Conclusions
and Future Work / 151
Acknowledgment / 152
References / 152
9 GENOME-WIDE
SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY
9.1
Introduction / 155
9.2
Background / 156
9.3
Methodology / 157
9.4 Results
and Interpretation / 161
9.5
Conclusion / 162
References / 163
III PATTERN
RECOGNITION IN TERTIARY STRUCTURES
10 MOTIF
DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES
10.1
Introduction / 167
10.2 From
Protein 3D-Structures to Protein Graphs / 169
10.3 Graph
Mining / 172
10.4 Subgraph Mining / 173
10.5 Frequent Subgraph Discovery / 173
10.6 Feature
Selection / 179
10.7 Feature
Selection for Subgraphs / 180
10.8 Discussion
/ 183
10.9 Conclusion
/ 185
Acknowledgments ! 185
References / 186
11 FUZZY AND
UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION OF PROTEIN
TERTIARY STRUCTURES
11.1 Introduction / 190
11.2 Genetic
Algorithms / 192
11.3 Supervised
Machine Learning Algorithm / 201
11.4 Fuzzy
Application / 204
11.5 Conclusion
/ 207
References / 208
12 PROTEIN
INTER-DOMAIN LINKER PREDICTION
12.l Introduction / 212
12.2 Protein
Structure Overview / 213
12.3 Technical
Challenges and Open Issues / 214
12.4 Prediction
Assessment / 215
12.5 Current
Approaches / 216
12.6 Domain
Boundary Prediction Using Enhanced General Regression Network / 220
12.7 Inter-Domain
Linkers Prediction Using Compositional Index Simulated Annealing / 227
12.8 Conclusion
/ 232
References / 233
13
PREDICTION OF PROLINE CIS-TRANS ISOMERIZATION
13.1 Introduction
/ 236
13.2 Methods
/ 238
13.3 Model
Evaluation and Analysis / 243
13.4 Conclusion
/ 245
References / 245
IV PATTERN
RECOGNITION IN QUATERNARY STRUCTURES
14
PREDICTION OF PROTEIN QUATERNARY STRUCTURES
14.1 Introduction / 251
14.2 Protein
Structure Prediction / 255
14.3 Template-Based
Predictions / 257
14.4 Critical
Assessment of Protein Structure Prediction / 258
14.5 Quaternary
Structure Prediction / 258
14.6 Conclusion
/ 261
Acknowledgments / 261
References / 261
15
COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES
15.1
Introduction / 266
15.2 Similarity
in the Graph Model / 268
15.3 Measuring
Structural Similarity VIA MCES / 272
15.4 Protein
Comparison VIA Graph Spectra / 279
15.5 Conclusion
/ 287
References / 287
16
STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN-PROTEIN
INTERACTIONS
16.1 Introduction
/ 291
16.2 Structural
Domains / 293
16.3 The
Prediction Framework / 293
16.4 Feature
Extraction and Prediction Properties / 294
16.5 Feature
Selection / 299
16.6 Classification
/ 301
16.7 Evaluation and Analysis / 304
16.8 Results
and Discussion / 304
16.9 Conclusion
/ 309
References / 310
V PATTERN
RECOGNITION IN MICROARRAYS 315
17
CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS
17.1 Introduction
/ 317
17.2 Information
Retrieval: Terminology and Background / 318
17.3 Content-Based
Retrieval / 320
17.4 Microarray
Data and Databases / 322
17.5 Methods
for Retrieving Microarray Experiments / 324
17.6 Similarity
Metrics / 327
17.7 Evaluating
Retrieval Performance / 329
17.8 Software
Tools / 330
17.9 Conclusion
and Future Directions / 331
Acknowledgment / 332
References / 332
18
EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA
18.1 Introduction / 335
18.2 From
Microarray Image to Signal / 336
18.3 Microarray Signal Analysis / 337
18.4 Algorithms for De Gene Selection / 339
18.5 Gene
Ontology Enrichment and Gene Set Enrichment Analysis / 343
18.6 Conclusion / 345
References / 345
19
CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE
PATTERN ANALYSIS
19.1 Introduction / 347
19.2 Transcriptome Analysis / 348
19.3 Microarrays / 349
19.4 RNA-Seq
/ 351
19.5 Benefits and Drawbacks of RNA-Seq and
Microarray Technologies / 353
19.6 Gene
Expression Profile Analysis / 356
19.7 Real
Case Studies / 364
19.8 Conclusions
/ 367
References / 368
20 MINING
INFORMATIVE PATTERNS IN MICROARRAY DATA
371
20.1 Introduction
/ 371
20.2 Patterns
with Similarity / 373
20.3 Conclusion
/ 391
References / 391
21 ARROW
PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS OF BACKGROUND
CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394
21.1
Overview / 394
21.2 Arrow
Plot / 399
21.3 Significance Analysis of Microarrays / 404
21.4 Correspondence
Analysis / 405
21.5 Impact
of the Preprocessing Methods / 407
21.6 Conclusions / 412
Acknowledgments ! 413
References / 413
VI PATTERN
RECOGNITION IN PHYLOGENETIC TREES
22 PATTERN
RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS
22.1 Introduction
/ 419
22.2 Networks
and Trees / 420
22.3 Patterns
and Their Processes / 424
22.4 The
Types of Patterns / 427
22.5 Fingerprints
/ 431
22.6 Constructing
Networks / 433
22.7 Multi-Labeled
Trees / 435
22.8 Conclusion
/ 436
References / 437
23 DIVERSE
CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION: IMPACTS FROM
MODEL MlSSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN RECOGNITION
23.1 Introduction / 440
23.2 Overview
on Methods and Frameworks for Phylogenetic Tree Reconstruction / 440
23.3 Influence
of Substitution Model Misspecification on Phylogenetic Tree Reconstruction /
445
23.4 Influence
of Recombination on Phylogenetic Tree Reconstruction / 446
23.5 Influence
of Diverse Evolutionary Processes on Species Tree Reconstruction / 447
23.6 Influence
of Homoplasy on Phylogenetic Tree Reconstruction: The Goals of Pattern
Recognition / 449
23.7 Concluding
Remarks / 449
Acknowledgments / 450
References / 450
24 AUTOMATED
PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES
24.1 Introduction / 457
24.2 Preliminaries
/ 459
24.3 A Naive
Approach / 462
24.4 Toward
a Faster Method / 463
24.5 Improved
Algorithm / 467
24.6 Implementation
/ 473
24.7 Evaluation
/ 474
24.8 Conclusion
/ 479
Acknowledgment / 481
References / 481
25 A NEW
FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER EVENTS USING
PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS
25.1
Introduction / 483
25.2 Methods
/ 485
25.3 Experimental
Study / 491
25.4 Results
and Discussion / 501
25.5 Conclusion
/ 502
References / 503
VII PATTERN
RECOGNITION IN BIOLOGICAL NETWORKS
26
COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS
26.1
Introduction / 507
26.2 Measures/Metrics
/ 508
26.3 Models
of Biological Networks / 511
26.4 Reconstructing
and Partitioning Biological Networks / 511
26.5 PPINetworks
/ 513
26.6 Mining
PPI Networks--Interaction Prediction / 517
26.7 Conclusions
/ 519
References / 519
27
BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO
SPECIES INTERACTIONS
27.1 Introduction
/ 525
27.2 Molecular
Systems / 528
27.3 Ecological
Systems / 528
27.4 Models
and Evaluation / 529
27.5 Learning
Gene Regulation Networks / 532
27.6 Learning
Species Interaction Networks / 540
27.7 Conclusion
/ 550
References / 550
28
DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING:
APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC
LEUKEMIA
28.1 Introduction
/ 555
28.2 Toll-Like
Receptors / 557
28.3 Structural
Equation Modeling / 560
28.4 Application
/ 566
28.5 Conclusion
/ 580
References / 581
29
ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION
29.1
Introduction / 585
29.2 Related
Work / 587
29.3 Problem
Formulation / 589
29.4 Experimental
Setup / 592
29.5 Experimental
Analysis / 596
29.6 Conclusions
/ 605
Acknowledgments / 606
References / 606
INDEX