GLARE stands
for Global Library Assessment of REagents. The concepts and details about this
software are published in "Jean-François Truchon and Christopher I.
Bayly, Journal of Chemical Information and Modeling (2006) Vol. 46, pp 1536-1548". The
purpose of this software is to take large lists of reagents in the context of a
combinatorial library with two or more dimensions and produce a reduced list
that follows 'goodness' criteria applied on the combinatorial products formed
by those reagents. An example of 'goodness' criteria would be the Lipinski rule
of five. The idea here is not to definitively select the reagents for the
synthesis, but rather to reduce the original reagent lists based on some user
criteria and to ensure that any reagent remaining after GLARE optimization will
form a 'good' product with the reagents remaining in the other dimensions. In
fact, in designing GLARE, we spent a lot of attention on trying to retain the
maximum number of reagents that are likely to lead to 'good' products. Often,
more than one property is considered, making GLARE a multi-objective optimizer.
For example, GLARE can take out of the cherry picking equation the problem
forming too greasy or too large MW libraries, helping the chemist to spend more
time on other considerations that might be more reagents oriented as opposed to
products oriented.
We tried
hard to make this software generally applicable in real and difficult
situations such as immense combinatorial starting points (e.g. 1000x1000x1000x1000
products), and cases where large discrepancies between dimension list sizes
exist (e.g. 100x6000 products). The timings that we obtained, as shown in the
reference above, allow the optimization of very large initial combinatorial
libraries (>1012 products) possible on common workstations on the
<1 s up to 2 minutes time scale.
In GLARE,
the properties of the products are calculated with a simple addition of the
properties of the reagents. This avoids building the actual products and saves a
lot of CPU time. This translates into this equation:
where 'offset'
is a correction that is specific to each property and D is the number of
dimension.
Compound: (1) |
(2) |
(3) |
HBA: 1 |
2 |
2 |
Figure 1. LIB03 (GLARE reference) example.
Ex: HBA(3) = offset(HBA) + HBA(1) + HBA(2)
In the
example of Figure 1, the offset(HBA)
is -1.Therefore the HBA of all the products of LIB03 can be calculated using
the formula above with -1 as an offset. Therefore the input files only require
the properties for every reagent used. The only math that GLARE does to calculate
the product properties is to sum the property pertaining to each of the reagent
involved in the combinatorial product. The offset for each property considered
can be calculated from a single example and be used as described in the input data files section. The
additivity of the properties is assumed and has been verified for a number of
simple properties such as the number of hydrogen bond acceptors, the number of
hydrogen bond donors, logP, molecular weight, the number of non-hydrogen atoms,
the polar surface area, the number of rotatetable bonds, and more.
Each
dimension is defined by a list of reagents. The only thing that matters for
GLARE is the unique identifier of the reagent and its properties. The general
syntax is: REAGENT_ID [value]. For example, a reagent data file could be:
------------------------------------------------
MFCD00000297
1 1 0 24 3.4
MFCD00000357
1 1 0 16 3.5
MFCD00000452
1 1 0 6 0.8
…
-------------------------------------------------
The offset in GLARE is incorporated the same way and this is simply a
dimension with a single reagent with the properties of the offset:
-------------------------------------------------
offset -1.0
-1.0 0.0 1.0 -1.0
-------------------------------------------------
Each column
after the reagent_id corresponds to a property and the filtering rule will consider
that they are the same in all input data files. A dimension can be defined in
more than one input data file as outlined in the command input file section.
The command
file contains the data definition that is how each dimension needs to be
combined and the filtering rules to apply. It does not contain parameter
specific to the optimization control, this belongs to the command line options
described later in this document. In the next two tables, the keywords
recognized by GLARE are described.
Table 1.
Command input file keywords syntax
DIMDEF |
'DIMENSION
ALIAS' [data file name] |
DIMFREEZE |
'DIMENSION
ALIAS' |
LIBDEF |
'LIBRARY
ALIAS' ['DIMENSION ALIAS'] |
PROPDEF |
'PROPERTY
ALIAS' MIN(Float) MAX(float) |
INPUTDEF |
['PROPDEF'] |
# |
Any comment
text |
[ ] means
one or more element
'a' means a
variable definition that can be aliased in other definitions
Table 2.
Command input file keywords definition and utility
DIMDEF |
defines a
dimension by a list of input data file; the alias can be reused in LIBDEF or
DIMFREEZE keywords |
DIMFREEZE |
Identifies
a dimension to be frozen. No reagent will be rejected in this dimension
during optimization. |
LIBDEF |
Defines a
combinatorial library. All the reagents defined in the associated dimensions
will be considered to be forming combinatorial products. If more than one
LIBDEF keyword is used, the stop criteria of the optimization applies to all
the products formed in all the combinatorial libraries. More than one LIBDEF
keyword is used when sub-libraries are formed involving the same dimension in
more than one library. In this case, the reagents kept after the optimization
satisfies the goodness* criteria in all libraries. |
PROPDEF |
Defines a
property alias with the minimum and maximum values that will be used to identify
the products that pass the 'goodness' test. |
INPUTDEF |
Tells GLARE
the order of the properties read in the data input file. This also defines
how the 'goodness' rule is applied. |
# |
When is
first character of the line tells GLARE to ignore this line. |
* see last
section on definitions
The Figure 2
and Figure 3 illustrate how to use these keywords in real case examples. The
fist example show how to setup GLARE to optimize a simple amide library where
the first amine dimension comes from one input data file and the acid dimension
contains the reagents from two distinct files that will be parsed and combined
into one dimension. Four property ranges are also defined where only three are
expected in the input data file in the order specified by the INPUTDEF line. For
example, if the input data files shown above are used, only the three first
property columns are going to be considered. These ranges define a
multi-objective criteria that the 'good' products must meet simultaneously.
DIMDEF |
R1 ~/glare/example/amines.gli |
DIMDEF |
R2
~/glare/example/acids1.gli ~/glare/example/acids2.gli |
DIMDEF |
Offset
~/glare/example/offset.gli |
LIBDEF |
AMIDES R1
R2 Offset |
PROPDEF |
HBA 0 10 |
PROPDEF |
HBD 0 5 |
PROPDEF |
MW 0
500.0 |
PROPDEF |
logP -2.4
5.0 |
INPUTDEF |
HBA HBD
logP |
Figure 2.
Command input file example 1.
DIMDEF |
R1A
~/glare/example/ LIB03_R1_alcohol_Universe.gli |
DIMDEF |
R1B
~/glare/example/LIB03_R1_thiourea_Universe.gli |
DIMDEF |
R2 ~/glare/example/LIB03_R2_aminoacid_Universe.gli |
DIMDEF |
OFFSETA
~/glare/example/LIB03_alcohol_offset.gli |
DIMDEF |
OFFSETB
~/glare/example/LIB03_thiourea_offset.gli |
LIBDEF |
LIB03A
R1A R2 OFFSETA |
LIBDEF |
LIB03B
R1B R2 OFFSETB |
PROPDEF |
HBA 0 10 |
PROPDEF |
HBD 0 5 |
PROPDEF |
NONH 0 33 |
PROPDEF |
MW 0
450.0 |
PROPDEF |
LOGP -2.4
5.0 |
INPUTDEF |
HBA HBD
MW NONH LOGP |
Figure 3.
Command input file example2: LIB03_definition
The Figure 3
LIB03_definition command input file incorporates the notion of multiple
libraries sharing a common set of reagents. In fact, the first dimension uses
two different types of reagent that are combined with a different chemistry to
a second dimension formed by amino acid reagents. This results in two
definitions of LIBDEF which simply asks GLARE to form the products of those two
libraries reusing R2 in both, that is GLARE will produce a unique set of R2
reagents that work with both sub-libraries. The GLARE optimization requirements
are therefore applied on all the defined libraries (LIBDEF keywords) at the
same time. For example, a threshold of 95% goodness is achieved if the products
from LIB03A and LIB03B pass the filtering rule 95% of the time. In the input
data file, six columns are expected (molecule_id followed by 5 properties) and
a 'good' product will have the sum of the first property column of all its
associated reagents to be between 0 and 10, the sum of the second property
column to be between 0 and 5, the third property column between 0 and 450.0,
the fourth property column between 0 and 33 and the fifth property column
between -2.4 and 5.0. Here is the trace of the GLARE optimization from the
LIB03_definition command file:
GLARE
execution on example 2:
> Glare.exe -i LIB03_definition
------- PARAMETERS: --------------
GOODNESS THRESHOLD : 95
MIN PARTITION SIZE : 16
INITIAL FRACTION TO KEEP: AUTOMATIC
ACTUAL SIZE
LIB03A : 1887 x 1785 x 1 = 3.3683e+006
ACTUAL SIZE
LIB03B : 91 x 1785 x 1 = 162435
------- ITERATION : 1 --------------
GOODNESS :
34%
NUMBER EVAL :
93959
CUMUL. EVAL. :
93959
KEPT IN STEP :
100 %
ACTUAL SIZE
LIB03A : 1887 x 1785 x 1 = 3.3683e+006
EFFECTIVENESS
LIB03A : 100%
ACTUAL SIZE
LIB03B : 91 x 1785 x 1 = 162435
EFFECTIVENESS
LIB03B : 100%
BRAVI EFFECT. : 100%
------- ITERATION : 2 --------------
GOODNESS :
67.6%
NUMBER EVAL :
50661
CUMUL. EVAL. :
144620
KEPT IN STEP :
53 %
ACTUAL SIZE
LIB03A : 1041 x 999 x 1 = 1.03996e+006
EFFECTIVENESS
LIB03A : 55.6%
ACTUAL SIZE
LIB03B : 88 x 999 x 1 = 87912
EFFECTIVENESS
LIB03B : 76.3%
BRAVI EFFECT. : 63.4%
------- ITERATION : 3 --------------
GOODNESS :
84.1%
NUMBER EVAL :
39909
CUMUL. EVAL. :
184529
KEPT IN STEP :
78.9 %
ACTUAL SIZE
LIB03A : 831 x 801 x 1 = 665631
EFFECTIVENESS
LIB03A : 44.5%
ACTUAL SIZE
LIB03B : 87 x 801 x 1 = 69687
EFFECTIVENESS
LIB03B : 70.2%
BRAVI EFFECT. : 51.4%
------- ITERATION : 4 --------------
GOODNESS :
88.4%
NUMBER EVAL :
36366
CUMUL. EVAL. :
220895
KEPT IN STEP :
91.6 %
ACTUAL SIZE
LIB03A : 764 x 738 x 1 = 563832
EFFECTIVENESS
LIB03A : 40.9%
ACTUAL SIZE
LIB03B : 86 x 738 x 1 = 63468
EFFECTIVENESS
LIB03B : 67.9%
BRAVI EFFECT. : 46.1%
------- ITERATION : 5 --------------
GOODNESS :
90.8%
NUMBER EVAL :
34493
CUMUL. EVAL. :
255388
KEPT IN STEP :
94.9 %
ACTUAL SIZE
LIB03A : 727 x 703 x 1 = 511081
EFFECTIVENESS
LIB03A : 39%
ACTUAL SIZE
LIB03B : 85 x 703 x 1 = 59755
EFFECTIVENESS
LIB03B : 66.4%
BRAVI EFFECT. : 43.1%
------- ITERATION : 6 --------------
GOODNESS :
91.3%
NUMBER EVAL :
33019
CUMUL. EVAL. :
288407
KEPT IN STEP :
96.7 %
ACTUAL SIZE
LIB03A : 704 x 681 x 1 = 479424
EFFECTIVENESS
LIB03A : 37.7%
ACTUAL SIZE
LIB03B : 84 x 681 x 1 = 57204
EFFECTIVENESS
LIB03B : 65.2%
BRAVI EFFECT. : 40.8%
------- ITERATION : 7 --------------
GOODNESS :
92.5%
NUMBER EVAL :
31813
CUMUL. EVAL. :
320220
KEPT IN STEP :
97.2 %
ACTUAL SIZE
LIB03A : 685 x 663 x 1 = 454155
EFFECTIVENESS
LIB03A : 36.7%
ACTUAL SIZE
LIB03B : 83 x 663 x 1 = 55029
EFFECTIVENESS
LIB03B : 64.2%
BRAVI EFFECT. : 39.2%
------- ITERATION : 8 --------------
GOODNESS :
93.2%
NUMBER EVAL :
31029
CUMUL. EVAL. :
351249
KEPT IN STEP :
98.1 %
ACTUAL SIZE
LIB03A : 673 x 651 x 1 = 438123
EFFECTIVENESS
LIB03A : 36.1%
ACTUAL SIZE
LIB03B : 82 x 651 x 1 = 53382
EFFECTIVENESS
LIB03B : 63.3%
BRAVI EFFECT. : 38.1%
------- ITERATION : 9 --------------
GOODNESS :
93.6%
NUMBER EVAL :
30180
CUMUL. EVAL. :
381429
KEPT IN STEP :
98.6 %
ACTUAL SIZE
LIB03A : 664 x 642 x 1 = 426288
EFFECTIVENESS
LIB03A : 35.6%
ACTUAL SIZE
LIB03B : 81 x 642 x 1 = 52002
EFFECTIVENESS
LIB03B : 62.5%
BRAVI EFFECT. : 37.2%
------- ITERATION : 10 --------------
GOODNESS :
94%
NUMBER EVAL :
29788
CUMUL. EVAL. :
411217
KEPT IN STEP :
98.9 %
ACTUAL SIZE
LIB03A : 657 x 636 x 1 = 417852
EFFECTIVENESS
LIB03A : 35.2%
ACTUAL SIZE
LIB03B : 80 x 636 x 1 = 50880
EFFECTIVENESS
LIB03B : 61.8%
BRAVI EFFECT. : 36.6%
------- ITERATION : 11 --------------
GOODNESS :
95%
NUMBER EVAL :
34453
CUMUL. EVAL. :
445670
KEPT IN STEP :
99.2 %
ACTUAL SIZE
LIB03A : 652 x 631 x 1 = 411412
EFFECTIVENESS
LIB03A : 35%
ACTUAL SIZE
LIB03B : 79 x 631 x 1 = 49849
EFFECTIVENESS
LIB03B : 61.1%
BRAVI EFFECT. : 36.5%
The
optimization proceeded in 11 iterations and five output files are produced:
R1A.glo, R1B.glo, R2.glo, OFFSETA.glo and OFFSETB.glo. Of course the OFFSETA.glo
and OFFSETB.glo files are just repetition of the related input files. The file
R1A.glo and R1B.glo contains the alcohol and the thiol reagents kept by GLARE
to form two sub-libraires with 95% goodness overall. R2.glo contains the amino
acids that will work in both LIB03A and LIB03B 95% of the time. The output
files are named after the DIMDEF alias given in the command file. Before the
first iteration, the important optimization parameters are printed followed by
a state description for each iteration which contains the current goodness, the
number of product evaluations performed at this iteration and a cumulative
count, the percentage of reagent kept (Ki before the scaling is applied), the
size and effectiveness of each sub-library defined and the overall Bravi et al.
effectiveness (see GLARE paper for more details). The dimensions with one
reagent are due to the offset input data files.
Several
command line options are available, but few of them should normally be modified.
In this section all the options are described with heuristics to solve some
issues. Table 3 gives all the options with their descriptions. The only
mandatory option is "–i command_input_file_name". Most of the time
GLARE is called the following way: glare.exe –i command_input_file (on
Windows).
The option
that affects the most the speed of execution is "-p
nbr_reagents_per_partition". Smaller the number of reagent per partition,
faster is the optimization. The minimum value allowed is 2 and there is no maximum
value because the largest partition would contain all the reagents, which would
be equivalent to no partitioning (-q). Also, the smaller the number of reagents
per partition, the smaller are the reagent lists at the end of the optimization
(smaller effectiveness). In other words, to increase the number of reagents
kept for a given goodness threshold, one needs to increase the number of
reagent per partition. In extensive studies, it has been shown that 16 reagents
per partition constitute a good compromise. This is the default behavior of
GLARE.
Table 3. The
command line options are mainly controlling the optimization possibilities.
Option |
Argument type |
Description |
-i |
[string] |
This option specifies the input command file.
It is mandatory. If this option is specified more than once, every input
command file are considered. |
-o |
[string] |
The argument of this option specifies the
output file name where the trace of the optimization is logged. Default:
standard output (stdout). |
-c |
[float] |
The argument of this option gives the
goodness threshold that is used as a stop criterion. A value of 75.0 means
75%. Default: 95. |
-m |
[int] |
Maximum number of iterations allowed. This is
a stop criterion. Default: 100. |
-p |
[int] |
Indicate that the optimization needs to use
the partitioning scheme. The argument gives the minimum number of reagent per
partition. Default: 16. |
-q |
|
Turn off the partitioning scheme. |
-s |
[float] |
Use the scaled pruning. The argument is the
exponential parameter that decides the steepness of the switch over function.
Default value: 6.0. |
-n |
|
GLARE removes only the reagents not part of a
single good product at first iteration. This normally has no benefit and
reduces the performance. Default: turned off. |
-k |
[float] |
Fraction of reagent to keep at first
iteration. A value of 75 means that 25% of the worst reagents are rejected
and 75% kept. Default: this is automatically determined by GLARE with an
optimal guess. This option is K0 parameter discussed in the GLARE
reference article. |
The command: Glare.exe -i LIB03_definition above is equivalent to Glare.exe –i LIB03_definition –m 100 –p 16 –c 95.0 –s 6.0. It is important to add that the –i option can appear more than once. This is useful if there is a single file where the PROPDEF and INPUTDEF keys are defined once. This way, many users can use the same property definitions, for example: Glare.exe -i LIB03_definition –i property_definition_file.
The goodness reflects the ratio of 'good' products in the actual product
set:
Goodness = ( number of products
passing the filtering rule )/( total number of products )
The effectiveness reflects the number of reagents remaining relative to
the initial number of reagents given to GLARE:
Effectiveness = 1/N * [ ( number of
reagents in dimension 1 ) / ( initial number of reagents in dimension 1) + … +
( number of reagent in dimension N )/( initial number of reagents in dimension
N ) ]