I recently found myself synthesizing a simple molecule which can be used as a photoacid, namely, 4-formyl-6-methoxy-3-nitrophenoxyacetic acid, accessible in two simple steps from vanillin*1.

I am always surprised when I can make such interesting and useful molecules so darn easily and cheaply, I am interested in a method where I could produce an exhaustive list of all such molecules, i.e. every small molecule that is synthetically accessible in under five steps from the pool of cheap starting materials, using only “easy” reactions.

I have a few questions, and wanted to subject myself to some critisism/comment, I try to detail my idea below.

What Are Cheap Starting Materials?
Naturally the definition of “cheap” depends on supply and demand (which also drives discovery of new chemical knowledge); but I think a good criteria would be anything I could buy on the gram scale for under $100. This list would include things like vanillin, citronellal, carvone, industrial dyes, natural amino acids, common acids/bases (sulphuric acid, sodium hydroxide), cyclopentadiene, benzene, acetone, ethanol, methanol etc. I think you get the idea.

It is surprisingly difficult to construct such a list of starting materials (in the form of a list of SMILES strings). Aldrich does not seem to have the ability to query by price in their internal database system (that is what they tell me at least). If anyone can think of a better way to do this than burning me eyes out in front of Chemdraw and an Aldrich catalog for a few weekends, please let me know. I think 1000 compounds would be a good place to start. I’d put up a wiki/database system linked to suppliers if anyone else would find value in such a list.

What Are Easy Reactions?
Sharpless et. al define the notion of “Click Chemistry”*2 as a set of reactions which been demonstrated to posses very favorable exergonicities in multiple contexts (among a few other qualifiers). The basic idea is that the these reactions tend to “just work” and include reactions like Diels-Alder, acetylene-azide dipolar cycloaddition and epoxide formation/opening. While I wouldn’t necessarily limit myself strictly to “click” reactions, the idea is to limit the number of reactions that will be applied to good easy ones.

I haven’t run across an elegant way to specify chemical reactions in detail. I am collaborating with Gabriel Valiente, who has implemented a basic reaction facility to PerlMol. He has also wondered if there is some sort of algebra by which we can make smaller descriptions, specify branched outcomes etc. that take advantage of the fact the most reactions have very “natural” atom mappings, as opposed to the completely explicit mappings we are using right now. I know that Sertanty Inc. has something neat in the works, but given it is commercial, I don’t have too much of an idea.

What Good Would It Be?

The final result would be to iteratively apply our easy reactions to our cheap starting materials (filtering conformationally implausible beasts, at each step) to generate a list of compounds and plausible synthetic routes to them. It’d be interesting to ask about the comparative molecular diversity of the final list, and statistical information about the synthetic routes.

Again please let me know about any ideas about:

a)How to get a large list of cheap compounds in the form of SMILES strings.
b)A better way of specifying chemical reactions, rather than explicit atom mapping (software etc.).

