WORKFLOW RE-USE AND DISCOVERY IN BIOINFORMATICS
Available from scholar.google.com
Page 1
WORKFLOW RE-USE AND DISCOVERY IN BIOINFORMATICS
WORKFLOW RE-USE AND
DISCOVERY IN BIOINFORMATICS
A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
IN THE FACULTY OF ENGINEERING AND PHYSICAL SCIENCES
2008
By
Antoon Goderis
School of Computer Science
DISCOVERY IN BIOINFORMATICS
A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
IN THE FACULTY OF ENGINEERING AND PHYSICAL SCIENCES
2008
By
Antoon Goderis
School of Computer Science
Page 2
Contents
Abstract 13
Declaration 14
Copyright 15
Acknowledgements 16
1 Introduction 17
1.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 External contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Workflows, workflow re-use and repurposing 23
2.1 Workflows in science . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.1 Why workflows in science? . . . . . . . . . . . . . . . . . . 23
2.1.2 Anatomy of a scientific workflow . . . . . . . . . . . . . . . 27
2.1.3 Formal definition . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Workflow re-use and repurposing . . . . . . . . . . . . . . . . . . . . 35
2.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 Case studies in workflow re-use . . . . . . . . . . . . . . . . 37
2.2.3 Workflow re-use requirements . . . . . . . . . . . . . . . . . 41
2.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3.1 Workflow re-use in business . . . . . . . . . . . . . . . . . . 47
2.3.2 Workflow re-use in science . . . . . . . . . . . . . . . . . . . 49
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2
Abstract 13
Declaration 14
Copyright 15
Acknowledgements 16
1 Introduction 17
1.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4 External contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Workflows, workflow re-use and repurposing 23
2.1 Workflows in science . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.1 Why workflows in science? . . . . . . . . . . . . . . . . . . 23
2.1.2 Anatomy of a scientific workflow . . . . . . . . . . . . . . . 27
2.1.3 Formal definition . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Workflow re-use and repurposing . . . . . . . . . . . . . . . . . . . . 35
2.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 Case studies in workflow re-use . . . . . . . . . . . . . . . . 37
2.2.3 Workflow re-use requirements . . . . . . . . . . . . . . . . . 41
2.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3.1 Workflow re-use in business . . . . . . . . . . . . . . . . . . 47
2.3.2 Workflow re-use in science . . . . . . . . . . . . . . . . . . . 49
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2
Page 3
3 Workflow discovery 51
3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.1 Relation to workflow re-use and repurposing . . . . . . . . . 52
3.1.2 Relation between discovery and composition . . . . . . . . . 52
3.2 Workflow discovery requirements . . . . . . . . . . . . . . . . . . . 54
3.2.1 Scalable discovery techniques . . . . . . . . . . . . . . . . . 54
3.2.2 A comprehensive discovery model . . . . . . . . . . . . . . . 54
3.2.3 The process knowledge acquisition bottleneck . . . . . . . . . 54
3.2.4 Lack of workflow fragment rankings . . . . . . . . . . . . . . 55
3.3 Information need for workflow discovery . . . . . . . . . . . . . . . 55
3.3.1 Construction of an in silico analysis . . . . . . . . . . . . . . 55
3.3.2 Linking work done in vivo and in vitro with work done in silico 57
3.3.3 Validation and extension of publications. . . . . . . . . . . . 58
3.4 Workflow discovery matching types . . . . . . . . . . . . . . . . . . 60
3.4.1 Workflow discovery by signature and structure . . . . . . . . 60
3.4.2 Structural workflow matching types . . . . . . . . . . . . . . 61
3.4.3 Similarity-based matching . . . . . . . . . . . . . . . . . . . 63
3.4.4 Complement-based matching . . . . . . . . . . . . . . . . . . 69
3.5 Workflow discovery tasks formally . . . . . . . . . . . . . . . . . . . 76
3.5.1 Calculating workflow similarity . . . . . . . . . . . . . . . . 78
3.5.2 Finding workflow extensions . . . . . . . . . . . . . . . . . . 79
3.5.3 Finding workflow insertions . . . . . . . . . . . . . . . . . . 81
3.5.4 Finding workflow replacements . . . . . . . . . . . . . . . . 82
3.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6.2 Discovery support in scientific workflow systems . . . . . . . 85
3.6.3 Techniques in support of concrete workflow discovery . . . . 86
3.6.4 Classifying techniques by workflow matching conditions . . . 94
3.7 Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . 95
4 Building benchmarks for workflow re-use 97
4.1 Overview of experiments . . . . . . . . . . . . . . . . . . . . . . . . 97
4.1.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 98
4.1.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.1.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.1.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3
3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.1 Relation to workflow re-use and repurposing . . . . . . . . . 52
3.1.2 Relation between discovery and composition . . . . . . . . . 52
3.2 Workflow discovery requirements . . . . . . . . . . . . . . . . . . . 54
3.2.1 Scalable discovery techniques . . . . . . . . . . . . . . . . . 54
3.2.2 A comprehensive discovery model . . . . . . . . . . . . . . . 54
3.2.3 The process knowledge acquisition bottleneck . . . . . . . . . 54
3.2.4 Lack of workflow fragment rankings . . . . . . . . . . . . . . 55
3.3 Information need for workflow discovery . . . . . . . . . . . . . . . 55
3.3.1 Construction of an in silico analysis . . . . . . . . . . . . . . 55
3.3.2 Linking work done in vivo and in vitro with work done in silico 57
3.3.3 Validation and extension of publications. . . . . . . . . . . . 58
3.4 Workflow discovery matching types . . . . . . . . . . . . . . . . . . 60
3.4.1 Workflow discovery by signature and structure . . . . . . . . 60
3.4.2 Structural workflow matching types . . . . . . . . . . . . . . 61
3.4.3 Similarity-based matching . . . . . . . . . . . . . . . . . . . 63
3.4.4 Complement-based matching . . . . . . . . . . . . . . . . . . 69
3.5 Workflow discovery tasks formally . . . . . . . . . . . . . . . . . . . 76
3.5.1 Calculating workflow similarity . . . . . . . . . . . . . . . . 78
3.5.2 Finding workflow extensions . . . . . . . . . . . . . . . . . . 79
3.5.3 Finding workflow insertions . . . . . . . . . . . . . . . . . . 81
3.5.4 Finding workflow replacements . . . . . . . . . . . . . . . . 82
3.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.6.2 Discovery support in scientific workflow systems . . . . . . . 85
3.6.3 Techniques in support of concrete workflow discovery . . . . 86
3.6.4 Classifying techniques by workflow matching conditions . . . 94
3.7 Summary and discussion . . . . . . . . . . . . . . . . . . . . . . . . 95
4 Building benchmarks for workflow re-use 97
4.1 Overview of experiments . . . . . . . . . . . . . . . . . . . . . . . . 97
4.1.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 98
4.1.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.1.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.1.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3
Page 4
4.1.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.2 Experiment 1: cross author, white box re-use . . . . . . . . . . . . . 102
4.2.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 102
4.2.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.3 Experiment 2: cross author, black box re-use . . . . . . . . . . . . . . 112
4.3.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 112
4.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4 Experiment 3: cross author, black box re-use . . . . . . . . . . . . . . 114
4.4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 114
4.4.2 Participants and procedure . . . . . . . . . . . . . . . . . . . 114
4.4.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.4.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.5 Experiment 4: personal, black box re-use . . . . . . . . . . . . . . . 116
4.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 116
4.5.2 Participants and procedure . . . . . . . . . . . . . . . . . . . 116
4.5.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.6 Experiment 5: cross author, grey box re-use . . . . . . . . . . . . . . 117
4.6.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 117
4.6.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.6.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.6.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.8.1 Workflow re-use and discovery requirements confirmed . . . . 128
4.8.2 Understanding of workflow re-use and discovery behaviour . . 130
4
4.2 Experiment 1: cross author, white box re-use . . . . . . . . . . . . . 102
4.2.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 102
4.2.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.3 Experiment 2: cross author, black box re-use . . . . . . . . . . . . . . 112
4.3.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 112
4.3.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4 Experiment 3: cross author, black box re-use . . . . . . . . . . . . . . 114
4.4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 114
4.4.2 Participants and procedure . . . . . . . . . . . . . . . . . . . 114
4.4.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.4.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.5 Experiment 4: personal, black box re-use . . . . . . . . . . . . . . . 116
4.5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 116
4.5.2 Participants and procedure . . . . . . . . . . . . . . . . . . . 116
4.5.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.6 Experiment 5: cross author, grey box re-use . . . . . . . . . . . . . . 117
4.6.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . 117
4.6.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.6.3 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.6.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.8.1 Workflow re-use and discovery requirements confirmed . . . . 128
4.8.2 Understanding of workflow re-use and discovery behaviour . . 130
4
Page 5
5 Workflow discovery techniques 132
5.1 Overview of techniques . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.1.1 Data flows in Taverna . . . . . . . . . . . . . . . . . . . . . . 132
5.1.2 Source of workflow documention . . . . . . . . . . . . . . . 133
5.1.3 Chapter structure . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3 Google4WF: Full Text . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 138
5.3.2 Logical document view . . . . . . . . . . . . . . . . . . . . . 138
5.3.3 Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.4 Woogle4WF: Full Text + Structure . . . . . . . . . . . . . . . . . . . 138
5.4.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 139
5.4.2 Logical document view . . . . . . . . . . . . . . . . . . . . . 139
5.4.3 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.5 JMFeta: Index Terms . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.5.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 140
5.5.2 Logical document view . . . . . . . . . . . . . . . . . . . . . 140
5.5.3 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.6 OWL4WF: Index Terms + Structure . . . . . . . . . . . . . . . . . . 141
5.6.1 The promise of OWL DL for service discovery . . . . . . . . 141
5.6.2 Description Logics in a nutshell . . . . . . . . . . . . . . . . 142
5.6.3 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 142
5.6.4 Logical document view . . . . . . . . . . . . . . . . . . . . . 144
5.6.5 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.6.6 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.7 GUB4WF: Index Terms + Structure . . . . . . . . . . . . . . . . . . 151
5.7.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 151
5.7.2 Logical document view . . . . . . . . . . . . . . . . . . . . . 151
5.7.3 Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6 Evaluation of discovery techniques on benchmarks 156
6.1 Evaluation method . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2 Similarity-based personal and cross-author discovery . . . . . . . . . 158
6.2.1 Results based on data from Experiment 1 . . . . . . . . . . . 158
6.2.2 Results based on Benchmarks 1 and 2 . . . . . . . . . . . . . 160
5
5.1 Overview of techniques . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.1.1 Data flows in Taverna . . . . . . . . . . . . . . . . . . . . . . 132
5.1.2 Source of workflow documention . . . . . . . . . . . . . . . 133
5.1.3 Chapter structure . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3 Google4WF: Full Text . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 138
5.3.2 Logical document view . . . . . . . . . . . . . . . . . . . . . 138
5.3.3 Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.4 Woogle4WF: Full Text + Structure . . . . . . . . . . . . . . . . . . . 138
5.4.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 139
5.4.2 Logical document view . . . . . . . . . . . . . . . . . . . . . 139
5.4.3 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.5 JMFeta: Index Terms . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.5.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 140
5.5.2 Logical document view . . . . . . . . . . . . . . . . . . . . . 140
5.5.3 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.6 OWL4WF: Index Terms + Structure . . . . . . . . . . . . . . . . . . 141
5.6.1 The promise of OWL DL for service discovery . . . . . . . . 141
5.6.2 Description Logics in a nutshell . . . . . . . . . . . . . . . . 142
5.6.3 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 142
5.6.4 Logical document view . . . . . . . . . . . . . . . . . . . . . 144
5.6.5 Ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.6.6 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.7 GUB4WF: Index Terms + Structure . . . . . . . . . . . . . . . . . . 151
5.7.1 Knowledge acquisition bottleneck . . . . . . . . . . . . . . . 151
5.7.2 Logical document view . . . . . . . . . . . . . . . . . . . . . 151
5.7.3 Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6 Evaluation of discovery techniques on benchmarks 156
6.1 Evaluation method . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2 Similarity-based personal and cross-author discovery . . . . . . . . . 158
6.2.1 Results based on data from Experiment 1 . . . . . . . . . . . 158
6.2.2 Results based on Benchmarks 1 and 2 . . . . . . . . . . . . . 160
5
Page 64
CHAPTER 3. WORKFLOW DISCOVERY 64
but this matching feature may not increase their similarity much because workflow 1’s
output role does not correspond to workflow 2’s input role. Users that want to find
workflows that take in 3D structures will judge workflows that produce them as final
output irrelevant.
Choosing a model of similarity. Which similarity approach and metrics are most
suited for a given workflow discovery context is not obvious. When is it enough to
calculate the number of operations shared and not shared between workflows versus
when does one need to carefully distinguish between the role of inputs and outputs? We
do not provide an answer to these questions. Rather, we develop conditions to specify
identity between different workflow features to help answer them. The conditions relate
to the two discussed cognitive models of similarity as follows:
² In terms of the featural approach to similarity, the conditions specify when par-
ticular features and groups of features are common between workflows. Our fea-
tures correspond to workflow elements and combinations thereof, as introduced
earlier in section 2.1.3. The more elements are identical between workflows, the
more similar they are. When the conditions are not satisfied, they are informative
of the distinctive features between workflows, as needed to calculate the Tversky
measure. The conditions can serve as a basis for exploring a featural approach.
² The conditions capture the structure contained within workflows. We specify
basic conditions to describe when and how workflows share elements at different
levels. They provide the building blocks for an alignment-based approach.
In summary, we claim that the identity conditions developed in the following sub-
sections will help determine the suitable models of similarity for different workflow
discovery tasks. We investigate identity-based matching at the parameter, operation
and overall workflow level.
Identity matching at the parameter level
The parameter is at the lowest level in the workflow definition of section 2.1.3. We
define the polymorphic function SamePar to pin down what is meant by identity
matching at the parameter level. It says that two parameters of operations are identical
if they have the same name, syntactic and semantic type. SamePar is defined as:
1) The case for identity between input parameters.
SamePar : IN £ IN ¡! Boolean
but this matching feature may not increase their similarity much because workflow 1’s
output role does not correspond to workflow 2’s input role. Users that want to find
workflows that take in 3D structures will judge workflows that produce them as final
output irrelevant.
Choosing a model of similarity. Which similarity approach and metrics are most
suited for a given workflow discovery context is not obvious. When is it enough to
calculate the number of operations shared and not shared between workflows versus
when does one need to carefully distinguish between the role of inputs and outputs? We
do not provide an answer to these questions. Rather, we develop conditions to specify
identity between different workflow features to help answer them. The conditions relate
to the two discussed cognitive models of similarity as follows:
² In terms of the featural approach to similarity, the conditions specify when par-
ticular features and groups of features are common between workflows. Our fea-
tures correspond to workflow elements and combinations thereof, as introduced
earlier in section 2.1.3. The more elements are identical between workflows, the
more similar they are. When the conditions are not satisfied, they are informative
of the distinctive features between workflows, as needed to calculate the Tversky
measure. The conditions can serve as a basis for exploring a featural approach.
² The conditions capture the structure contained within workflows. We specify
basic conditions to describe when and how workflows share elements at different
levels. They provide the building blocks for an alignment-based approach.
In summary, we claim that the identity conditions developed in the following sub-
sections will help determine the suitable models of similarity for different workflow
discovery tasks. We investigate identity-based matching at the parameter, operation
and overall workflow level.
Identity matching at the parameter level
The parameter is at the lowest level in the workflow definition of section 2.1.3. We
define the polymorphic function SamePar to pin down what is meant by identity
matching at the parameter level. It says that two parameters of operations are identical
if they have the same name, syntactic and semantic type. SamePar is defined as:
1) The case for identity between input parameters.
SamePar : IN £ IN ¡! Boolean
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
8 Readers on Mendeley
by Discipline
by Academic Status
38% Ph.D. Student
13% Doctoral Student
13% Student (Master)
by Country
25% Germany
13% United Kingdom
13% Belgium


