Substantial enhancement of data retrieval in a drug information system using error-tolerant search algorithms
Available from
Simon Schmitt's profile on Mendeley.
Page 1
Substantial enhancement of data retrieval in a drug information system using error-tolerant search algorithms
Substantial enhancement of data retrieval in a drug information system
using error-tolerant search algorithms
J Kaltschmidt, SPW Schmitt, MG Pruszydlo, WE Haefeli
Dept. of Internal Medicine VI, Clinical Pharmacology and Pharmacoepidemiology, Im Neuenheimer Feld 410, 69120 Heidelberg
INTRODUCTION & AIM
Electronic drug information systems have considerable advantages over other information sources. However, while data retrieval is unsurpassed in electronic systems
they strongly depend on entry of adequate search terms. The spelling of roughly 1 of 5 entries into our drug information system AiDKlinik did not match a drug name or
an active ingredient which rendered data retrieval impossible and system performance suboptimal.
Using a single classic search algorithm like Levenshtein or Soundex would not solve the problem thoroughly since active ingredients like for example „ciclosporin“ may
have multiple usual spellings („Cyclosporin“ and „Ciclosporin“). Therefore the user would have to know in which way the information is stored in the system.
The aim of this study was to develop and test an electronic error-tolerant search algorithm to minimise retrieval failure due to spelling errors in a hospital environment.
METHODS
RESULTS
CONCLUSION
We developed an multistage error-tolerant algorithm (ETA) which
incorporates different methods of intelligent fuzzy matching algorithms.
ETA considers different possibilities of „wrong entries“ such as:
- phonetic errors (e.g. „Nadopa“ = „Madopar“)
- misspelling errors (e.g. „Cidovovir“ = „Cidofovir“)
- letter ordering mistakes (e.g. „Mydoclam “ = „Mydocalm “)
- synonyms (e.g. „Phenylacetamid“ = „Acetanilid“)
- multiple correct spellings (e.g. „Cyclosporin“ = „Ciclosporin“)
- alternative suffixes (e.g. „Chlodronsäure“ = „Chlodronat“)
This multistage algorithm was integrated in our web-based drug
information system AiDKlinik. We evaluated its performance in 174050
consecutive searches 18% of which did not readily match a brand name
or an active ingredient and prompted the start of ETA.
To evaluate whether the search results suggested by ETA indeed yielded
the information sought by the user, the actions (clicks) immediately
following displaying of the search result were further analysed. The
selection of a specific brand name or a summary of product
characteristics (SPC) was used as an unequivocal confirmation of retrieval
of the searched product (see Figure 2, green areas).
cephalosporin29
atacant30
lorzar30
tetrazyklin31
amoxycillin34
prostigmin35
leukovorin36
fenestil40
klazid42
tigecyclin43
metroprolol43
cellsept44
paracodein45
orthomol45
thalidomid47
delix plus47
cyclosporin51
immodium59
nabic60
amitryptilin72
Query stringFrequency
Table 1: Top 20 mistyped entries
In 61% (n3/n2; see Figure 2) of the ETA-assisted searches (11% of all queries) one or more suggestions were displayed leaving only 7% (n4=11939) of all queries
abortive. After searches with direct matching 43.01% (green areas) revealed unequivocal hits; with ETA supported searches the respective figure was 26.65%
suggesting that at least 63.4% of the ETA-assisted searches yielded the expected result. Analysis of the 7% abortive searches revealed entries of drugs not approved in
Germany and other medical content (e.g. diseases or adverse events) as important fractions of failing queries.
succeeding action
click on trade name
click on SPC-Icon
new search
abort
other
click on trade name
click on SPC-Icon
new search
abort
other
n=4302
n=695
n=8322
n=3056
n=2379
n=49618
n=9904
n=39232
n=19092
n=23819
35.02%
6.99%
27.69%
13.48%
16.81%
22.94%
3.71%
44.37%
16.30%
12.69%
sum n/n1
no
direct
match
(18%,
n2=30693)
unusable data caused
by script or database
errors (1%, n5=1724)
displayed
suggestions
(61% [n2=100%],
n3=18754)
174050
search
queries
direct match
(81%, n1=141633)
Figure 1: Result of an error-tolerant search for “Amitriptylin”. The system suggests alternative
spellings and automatically displays the results for the best fit.
Figure 2: The search results of 174050 queries entered into AiDKlinik were analysed. Direct matches and entries initiating ETA were further evaluated and
the subsequent selection of a trade name or an SPC were judged as confirmation of successful data retrieval.
Typing errors and other types of erroneous data entry are abundant in health care. They often lead to retrieval failures and consequently the wasting of resources and
time. More importantly they might also prevent access to data important for an adequate treatment of a patients at a given point in time. The frequency of misspellings
in this large number of queries suggests that an error-tolerant search algorithm is indispensable for any drug information system. The algorithm has to be a multi-stage
tool since conventional search algorithms are not suitable for medical terminology.
ETA
no
suggestion
(39% [n2=100%],
n4=11939)
n/n3
using error-tolerant search algorithms
J Kaltschmidt, SPW Schmitt, MG Pruszydlo, WE Haefeli
Dept. of Internal Medicine VI, Clinical Pharmacology and Pharmacoepidemiology, Im Neuenheimer Feld 410, 69120 Heidelberg
INTRODUCTION & AIM
Electronic drug information systems have considerable advantages over other information sources. However, while data retrieval is unsurpassed in electronic systems
they strongly depend on entry of adequate search terms. The spelling of roughly 1 of 5 entries into our drug information system AiDKlinik did not match a drug name or
an active ingredient which rendered data retrieval impossible and system performance suboptimal.
Using a single classic search algorithm like Levenshtein or Soundex would not solve the problem thoroughly since active ingredients like for example „ciclosporin“ may
have multiple usual spellings („Cyclosporin“ and „Ciclosporin“). Therefore the user would have to know in which way the information is stored in the system.
The aim of this study was to develop and test an electronic error-tolerant search algorithm to minimise retrieval failure due to spelling errors in a hospital environment.
METHODS
RESULTS
CONCLUSION
We developed an multistage error-tolerant algorithm (ETA) which
incorporates different methods of intelligent fuzzy matching algorithms.
ETA considers different possibilities of „wrong entries“ such as:
- phonetic errors (e.g. „Nadopa“ = „Madopar“)
- misspelling errors (e.g. „Cidovovir“ = „Cidofovir“)
- letter ordering mistakes (e.g. „Mydoclam “ = „Mydocalm “)
- synonyms (e.g. „Phenylacetamid“ = „Acetanilid“)
- multiple correct spellings (e.g. „Cyclosporin“ = „Ciclosporin“)
- alternative suffixes (e.g. „Chlodronsäure“ = „Chlodronat“)
This multistage algorithm was integrated in our web-based drug
information system AiDKlinik. We evaluated its performance in 174050
consecutive searches 18% of which did not readily match a brand name
or an active ingredient and prompted the start of ETA.
To evaluate whether the search results suggested by ETA indeed yielded
the information sought by the user, the actions (clicks) immediately
following displaying of the search result were further analysed. The
selection of a specific brand name or a summary of product
characteristics (SPC) was used as an unequivocal confirmation of retrieval
of the searched product (see Figure 2, green areas).
cephalosporin29
atacant30
lorzar30
tetrazyklin31
amoxycillin34
prostigmin35
leukovorin36
fenestil40
klazid42
tigecyclin43
metroprolol43
cellsept44
paracodein45
orthomol45
thalidomid47
delix plus47
cyclosporin51
immodium59
nabic60
amitryptilin72
Query stringFrequency
Table 1: Top 20 mistyped entries
In 61% (n3/n2; see Figure 2) of the ETA-assisted searches (11% of all queries) one or more suggestions were displayed leaving only 7% (n4=11939) of all queries
abortive. After searches with direct matching 43.01% (green areas) revealed unequivocal hits; with ETA supported searches the respective figure was 26.65%
suggesting that at least 63.4% of the ETA-assisted searches yielded the expected result. Analysis of the 7% abortive searches revealed entries of drugs not approved in
Germany and other medical content (e.g. diseases or adverse events) as important fractions of failing queries.
succeeding action
click on trade name
click on SPC-Icon
new search
abort
other
click on trade name
click on SPC-Icon
new search
abort
other
n=4302
n=695
n=8322
n=3056
n=2379
n=49618
n=9904
n=39232
n=19092
n=23819
35.02%
6.99%
27.69%
13.48%
16.81%
22.94%
3.71%
44.37%
16.30%
12.69%
sum n/n1
no
direct
match
(18%,
n2=30693)
unusable data caused
by script or database
errors (1%, n5=1724)
displayed
suggestions
(61% [n2=100%],
n3=18754)
174050
search
queries
direct match
(81%, n1=141633)
Figure 1: Result of an error-tolerant search for “Amitriptylin”. The system suggests alternative
spellings and automatically displays the results for the best fit.
Figure 2: The search results of 174050 queries entered into AiDKlinik were analysed. Direct matches and entries initiating ETA were further evaluated and
the subsequent selection of a trade name or an SPC were judged as confirmation of successful data retrieval.
Typing errors and other types of erroneous data entry are abundant in health care. They often lead to retrieval failures and consequently the wasting of resources and
time. More importantly they might also prevent access to data important for an adequate treatment of a patients at a given point in time. The frequency of misspellings
in this large number of queries suggests that an error-tolerant search algorithm is indispensable for any drug information system. The algorithm has to be a multi-stage
tool since conventional search algorithms are not suitable for medical terminology.
ETA
no
suggestion
(39% [n2=100%],
n4=11939)
n/n3
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
1 Reader on Mendeley
by Discipline
100% Medicine
by Academic Status
100% Ph.D. Student
by Country
100% Germany



