GLARE 1.0 User's Guide

GLARE stands for Global Library Assessment of REagents. The concepts and details about this software are published in "Jean-François Truchon and Christopher I. Bayly, Journal of Chemical Information and Modeling (2006) Vol. 46, pp 1536-1548". The purpose of this software is to take large lists of reagents in the context of a combinatorial library with two or more dimensions and produce a reduced list that follows 'goodness' criteria applied on the combinatorial products formed by those reagents. An example of 'goodness' criteria would be the Lipinski rule of five. The idea here is not to definitively select the reagents for the synthesis, but rather to reduce the original reagent lists based on some user criteria and to ensure that any reagent remaining after GLARE optimization will form a 'good' product with the reagents remaining in the other dimensions. In fact, in designing GLARE, we spent a lot of attention on trying to retain the maximum number of reagents that are likely to lead to 'good' products. Often, more than one property is considered, making GLARE a multi-objective optimizer. For example, GLARE can take out of the cherry picking equation the problem forming too greasy or too large MW libraries, helping the chemist to spend more time on other considerations that might be more reagents oriented as opposed to products oriented.

We tried hard to make this software generally applicable in real and difficult situations such as immense combinatorial starting points (e.g. 1000x1000x1000x1000 products), and cases where large discrepancies between dimension list sizes exist (e.g. 100x6000 products). The timings that we obtained, as shown in the reference above, allow the optimization of very large initial combinatorial libraries (>10¹² products) possible on common workstations on the <1 s up to 2 minutes time scale.

Property offset calculation

In GLARE, the properties of the products are calculated with a simple addition of the properties of the reagents. This avoids building the actual products and saves a lot of CPU time. This translates into this equation:

where 'offset' is a correction that is specific to each property and D is the number of dimension.


Compound: (1)	(2)	(3)
HBA: 1	2	2

Figure 1. LIB03 (GLARE reference) example.

Ex: HBA(3) = offset(HBA) + HBA(1) + HBA(2)

In the example of Figure 1, the offset(HBA) is -1.Therefore the HBA of all the products of LIB03 can be calculated using the formula above with -1 as an offset. Therefore the input files only require the properties for every reagent used. The only math that GLARE does to calculate the product properties is to sum the property pertaining to each of the reagent involved in the combinatorial product. The offset for each property considered can be calculated from a single example and be used as described in the input data files section. The additivity of the properties is assumed and has been verified for a number of simple properties such as the number of hydrogen bond acceptors, the number of hydrogen bond donors, logP, molecular weight, the number of non-hydrogen atoms, the polar surface area, the number of rotatetable bonds, and more.

Input data files

Each dimension is defined by a list of reagents. The only thing that matters for GLARE is the unique identifier of the reagent and its properties. The general syntax is: REAGENT_ID [value]. For example, a reagent data file could be:

------------------------------------------------

MFCD00000297 1 1 0 24 3.4

MFCD00000357 1 1 0 16 3.5

MFCD00000452 1 1 0 6 0.8

…

-------------------------------------------------

The offset in GLARE is incorporated the same way and this is simply a dimension with a single reagent with the properties of the offset:

-------------------------------------------------

offset -1.0 -1.0 0.0 1.0 -1.0

-------------------------------------------------

Each column after the reagent_id corresponds to a property and the filtering rule will consider that they are the same in all input data files. A dimension can be defined in more than one input data file as outlined in the command input file section.

Command input file

The command file contains the data definition that is how each dimension needs to be combined and the filtering rules to apply. It does not contain parameter specific to the optimization control, this belongs to the command line options described later in this document. In the next two tables, the keywords recognized by GLARE are described.

Table 1. Command input file keywords syntax

DIMDEF	'DIMENSION ALIAS' [data file name]
DIMFREEZE	'DIMENSION ALIAS'
LIBDEF	'LIBRARY ALIAS' ['DIMENSION ALIAS']
PROPDEF	'PROPERTY ALIAS' MIN(Float) MAX(float)
INPUTDEF	['PROPDEF']
#	Any comment text

[ ] means one or more element

'a' means a variable definition that can be aliased in other definitions

Table 2. Command input file keywords definition and utility

DIMDEF	defines a dimension by a list of input data file; the alias can be reused in LIBDEF or DIMFREEZE keywords
DIMFREEZE	Identifies a dimension to be frozen. No reagent will be rejected in this dimension during optimization.
LIBDEF	Defines a combinatorial library. All the reagents defined in the associated dimensions will be considered to be forming combinatorial products. If more than one LIBDEF keyword is used, the stop criteria of the optimization applies to all the products formed in all the combinatorial libraries. More than one LIBDEF keyword is used when sub-libraries are formed involving the same dimension in more than one library. In this case, the reagents kept after the optimization satisfies the goodness* criteria in all libraries.
PROPDEF	Defines a property alias with the minimum and maximum values that will be used to identify the products that pass the 'goodness' test.
INPUTDEF	Tells GLARE the order of the properties read in the data input file. This also defines how the 'goodness' rule is applied.
#	When is first character of the line tells GLARE to ignore this line.

* see last section on definitions

The Figure 2 and Figure 3 illustrate how to use these keywords in real case examples. The fist example show how to setup GLARE to optimize a simple amide library where the first amine dimension comes from one input data file and the acid dimension contains the reagents from two distinct files that will be parsed and combined into one dimension. Four property ranges are also defined where only three are expected in the input data file in the order specified by the INPUTDEF line. For example, if the input data files shown above are used, only the three first property columns are going to be considered. These ranges define a multi-objective criteria that the 'good' products must meet simultaneously.

DIMDEF	R1 ~/glare/example/amines.gli
DIMDEF	R2 ~/glare/example/acids1.gli ~/glare/example/acids2.gli
DIMDEF	Offset ~/glare/example/offset.gli
LIBDEF	AMIDES R1 R2 Offset
PROPDEF	HBA 0 10
PROPDEF	HBD 0 5
PROPDEF	MW 0 500.0
PROPDEF	logP -2.4 5.0
INPUTDEF	HBA HBD logP

Figure 2. Command input file example 1.

DIMDEF	R1A ~/glare/example/ LIB03_R1_alcohol_Universe.gli
DIMDEF	R1B ~/glare/example/LIB03_R1_thiourea_Universe.gli
DIMDEF	R2 ~/glare/example/LIB03_R2_aminoacid_Universe.gli
DIMDEF	OFFSETA ~/glare/example/LIB03_alcohol_offset.gli
DIMDEF	OFFSETB ~/glare/example/LIB03_thiourea_offset.gli
LIBDEF	LIB03A R1A R2 OFFSETA
LIBDEF	LIB03B R1B R2 OFFSETB
PROPDEF	HBA 0 10
PROPDEF	HBD 0 5
PROPDEF	NONH 0 33
PROPDEF	MW 0 450.0
PROPDEF	LOGP -2.4 5.0
INPUTDEF	HBA HBD MW NONH LOGP

Figure 3. Command input file example2: LIB03_definition

The Figure 3 LIB03_definition command input file incorporates the notion of multiple libraries sharing a common set of reagents. In fact, the first dimension uses two different types of reagent that are combined with a different chemistry to a second dimension formed by amino acid reagents. This results in two definitions of LIBDEF which simply asks GLARE to form the products of those two libraries reusing R2 in both, that is GLARE will produce a unique set of R2 reagents that work with both sub-libraries. The GLARE optimization requirements are therefore applied on all the defined libraries (LIBDEF keywords) at the same time. For example, a threshold of 95% goodness is achieved if the products from LIB03A and LIB03B pass the filtering rule 95% of the time. In the input data file, six columns are expected (molecule_id followed by 5 properties) and a 'good' product will have the sum of the first property column of all its associated reagents to be between 0 and 10, the sum of the second property column to be between 0 and 5, the third property column between 0 and 450.0, the fourth property column between 0 and 33 and the fifth property column between -2.4 and 5.0. Here is the trace of the GLARE optimization from the LIB03_definition command file:

GLARE execution on example 2:

> Glare.exe -i LIB03_definition

------- PARAMETERS: --------------

GOODNESS THRESHOLD : 95

MIN PARTITION SIZE : 16

INITIAL FRACTION TO KEEP: AUTOMATIC

ACTUAL SIZE LIB03A : 1887 x 1785 x 1 = 3.3683e+006

ACTUAL SIZE LIB03B : 91 x 1785 x 1 = 162435

------- ITERATION : 1 --------------

GOODNESS : 34%

NUMBER EVAL : 93959

CUMUL. EVAL. : 93959

KEPT IN STEP : 100 %

ACTUAL SIZE LIB03A : 1887 x 1785 x 1 = 3.3683e+006

EFFECTIVENESS LIB03A : 100%

ACTUAL SIZE LIB03B : 91 x 1785 x 1 = 162435

EFFECTIVENESS LIB03B : 100%

BRAVI EFFECT. : 100%

------- ITERATION : 2 --------------

GOODNESS : 67.6%

NUMBER EVAL : 50661

CUMUL. EVAL. : 144620

KEPT IN STEP : 53 %

ACTUAL SIZE LIB03A : 1041 x 999 x 1 = 1.03996e+006

EFFECTIVENESS LIB03A : 55.6%

ACTUAL SIZE LIB03B : 88 x 999 x 1 = 87912

EFFECTIVENESS LIB03B : 76.3%

BRAVI EFFECT. : 63.4%

------- ITERATION : 3 --------------

GOODNESS : 84.1%

NUMBER EVAL : 39909

CUMUL. EVAL. : 184529

KEPT IN STEP : 78.9 %

ACTUAL SIZE LIB03A : 831 x 801 x 1 = 665631

EFFECTIVENESS LIB03A : 44.5%

ACTUAL SIZE LIB03B : 87 x 801 x 1 = 69687

EFFECTIVENESS LIB03B : 70.2%

BRAVI EFFECT. : 51.4%

------- ITERATION : 4 --------------

GOODNESS : 88.4%

NUMBER EVAL : 36366

CUMUL. EVAL. : 220895

KEPT IN STEP : 91.6 %

ACTUAL SIZE LIB03A : 764 x 738 x 1 = 563832

EFFECTIVENESS LIB03A : 40.9%

ACTUAL SIZE LIB03B : 86 x 738 x 1 = 63468

EFFECTIVENESS LIB03B : 67.9%

BRAVI EFFECT. : 46.1%

------- ITERATION : 5 --------------

GOODNESS : 90.8%

NUMBER EVAL : 34493

CUMUL. EVAL. : 255388

KEPT IN STEP : 94.9 %

ACTUAL SIZE LIB03A : 727 x 703 x 1 = 511081

EFFECTIVENESS LIB03A : 39%

ACTUAL SIZE LIB03B : 85 x 703 x 1 = 59755

EFFECTIVENESS LIB03B : 66.4%

BRAVI EFFECT. : 43.1%

------- ITERATION : 6 --------------

GOODNESS : 91.3%

NUMBER EVAL : 33019

CUMUL. EVAL. : 288407

KEPT IN STEP : 96.7 %

ACTUAL SIZE LIB03A : 704 x 681 x 1 = 479424

EFFECTIVENESS LIB03A : 37.7%

ACTUAL SIZE LIB03B : 84 x 681 x 1 = 57204

EFFECTIVENESS LIB03B : 65.2%

BRAVI EFFECT. : 40.8%

------- ITERATION : 7 --------------

GOODNESS : 92.5%

NUMBER EVAL : 31813

CUMUL. EVAL. : 320220

KEPT IN STEP : 97.2 %

ACTUAL SIZE LIB03A : 685 x 663 x 1 = 454155

EFFECTIVENESS LIB03A : 36.7%

ACTUAL SIZE LIB03B : 83 x 663 x 1 = 55029

EFFECTIVENESS LIB03B : 64.2%

BRAVI EFFECT. : 39.2%

------- ITERATION : 8 --------------

GOODNESS : 93.2%

NUMBER EVAL : 31029

CUMUL. EVAL. : 351249

KEPT IN STEP : 98.1 %

ACTUAL SIZE LIB03A : 673 x 651 x 1 = 438123

EFFECTIVENESS LIB03A : 36.1%

ACTUAL SIZE LIB03B : 82 x 651 x 1 = 53382

EFFECTIVENESS LIB03B : 63.3%

BRAVI EFFECT. : 38.1%

------- ITERATION : 9 --------------

GOODNESS : 93.6%

NUMBER EVAL : 30180

CUMUL. EVAL. : 381429

KEPT IN STEP : 98.6 %

ACTUAL SIZE LIB03A : 664 x 642 x 1 = 426288

EFFECTIVENESS LIB03A : 35.6%

ACTUAL SIZE LIB03B : 81 x 642 x 1 = 52002

EFFECTIVENESS LIB03B : 62.5%

BRAVI EFFECT. : 37.2%

------- ITERATION : 10 --------------

GOODNESS : 94%

NUMBER EVAL : 29788

CUMUL. EVAL. : 411217

KEPT IN STEP : 98.9 %

ACTUAL SIZE LIB03A : 657 x 636 x 1 = 417852

EFFECTIVENESS LIB03A : 35.2%

ACTUAL SIZE LIB03B : 80 x 636 x 1 = 50880

EFFECTIVENESS LIB03B : 61.8%

BRAVI EFFECT. : 36.6%

------- ITERATION : 11 --------------

GOODNESS : 95%

NUMBER EVAL : 34453

CUMUL. EVAL. : 445670

KEPT IN STEP : 99.2 %

ACTUAL SIZE LIB03A : 652 x 631 x 1 = 411412

EFFECTIVENESS LIB03A : 35%

ACTUAL SIZE LIB03B : 79 x 631 x 1 = 49849

EFFECTIVENESS LIB03B : 61.1%

BRAVI EFFECT. : 36.5%

The optimization proceeded in 11 iterations and five output files are produced: R1A.glo, R1B.glo, R2.glo, OFFSETA.glo and OFFSETB.glo. Of course the OFFSETA.glo and OFFSETB.glo files are just repetition of the related input files. The file R1A.glo and R1B.glo contains the alcohol and the thiol reagents kept by GLARE to form two sub-libraires with 95% goodness overall. R2.glo contains the amino acids that will work in both LIB03A and LIB03B 95% of the time. The output files are named after the DIMDEF alias given in the command file. Before the first iteration, the important optimization parameters are printed followed by a state description for each iteration which contains the current goodness, the number of product evaluations performed at this iteration and a cumulative count, the percentage of reagent kept (Ki before the scaling is applied), the size and effectiveness of each sub-library defined and the overall Bravi et al. effectiveness (see GLARE paper for more details). The dimensions with one reagent are due to the offset input data files.

Command line options

Several command line options are available, but few of them should normally be modified. In this section all the options are described with heuristics to solve some issues. Table 3 gives all the options with their descriptions. The only mandatory option is "–i command_input_file_name". Most of the time GLARE is called the following way: glare.exe –i command_input_file (on Windows).

The option that affects the most the speed of execution is "-p nbr_reagents_per_partition". Smaller the number of reagent per partition, faster is the optimization. The minimum value allowed is 2 and there is no maximum value because the largest partition would contain all the reagents, which would be equivalent to no partitioning (-q). Also, the smaller the number of reagents per partition, the smaller are the reagent lists at the end of the optimization (smaller effectiveness). In other words, to increase the number of reagents kept for a given goodness threshold, one needs to increase the number of reagent per partition. In extensive studies, it has been shown that 16 reagents per partition constitute a good compromise. This is the default behavior of GLARE.

Table 3. The command line options are mainly controlling the optimization possibilities.

Option	Argument type	Description
-i	[string]	This option specifies the input command file. It is mandatory. If this option is specified more than once, every input command file are considered.
-o	[string]	The argument of this option specifies the output file name where the trace of the optimization is logged. Default: standard output (stdout).
-c	[float]	The argument of this option gives the goodness threshold that is used as a stop criterion. A value of 75.0 means 75%. Default: 95.
-m	[int]	Maximum number of iterations allowed. This is a stop criterion. Default: 100.
-p	[int]	Indicate that the optimization needs to use the partitioning scheme. The argument gives the minimum number of reagent per partition. Default: 16.
-q		Turn off the partitioning scheme.
-s	[float]	Use the scaled pruning. The argument is the exponential parameter that decides the steepness of the switch over function. Default value: 6.0.
-n		GLARE removes only the reagents not part of a single good product at first iteration. This normally has no benefit and reduces the performance. Default: turned off.
-k	[float]	Fraction of reagent to keep at first iteration. A value of 75 means that 25% of the worst reagents are rejected and 75% kept. Default: this is automatically determined by GLARE with an optimal guess. This option is K⁰ parameter discussed in the GLARE reference article.

The command: Glare.exe -i LIB03_definition above is equivalent to Glare.exe –i LIB03_definition –m 100 –p 16 –c 95.0 –s 6.0. It is important to add that the –i option can appear more than once. This is useful if there is a single file where the PROPDEF and INPUTDEF keys are defined once. This way, many users can use the same property definitions, for example: Glare.exe -i LIB03_definition –i property_definition_file.

Définitions

The goodness reflects the ratio of 'good' products in the actual product set:

Goodness = ( number of products passing the filtering rule )/( total number of products )

The effectiveness reflects the number of reagents remaining relative to the initial number of reagents given to GLARE:

Effectiveness = 1/N * [ ( number of reagents in dimension 1 ) / ( initial number of reagents in dimension 1) + … + ( number of reagent in dimension N )/( initial number of reagents in dimension N ) ]