E.B. Rudnyi, V.V. Kuzmenko, and G.F. Voronin

Chemistry Department, Moscow State University, 199899 Moscow, Russia

Presented at Molecular Simulation '99 Electronic Conference

The advances of material science depends heavily on reliable thermodynamic databases which allow us to optimize the synthesis conditions for the given material and to predict its behavior in different mediums.

Thermodynamic properties required, in principle, can be obtained during
two steps from first principles. *Ab initio* calculations give us the
potential surface for the system in question, and then statistical
thermodynamics or molecular dynamics brings forward the thermodynamic
functions. In practice, these two steps are separated from each other, and
quite often for the second step the interaction potential is taken not from
*ab initio* calculations but rather from semi-empirical approach (for
example, molecular mechanics).

Both tasks are usually referred to as direct problems. That is, there is some mathematical problem to solve (Schroedinger equation, configuration integral or motion equation), and all the parameters within it are assumed to be known.

However if we speak about quantitative results, the precision of non-empirical calculations still remains quite unsatisfactory. When the solution of the real practical problem in material science is required, no one would bet only on the results obtained by the approaches described above, because the precision of the experimental measurements available is by orders of magnitude higher than those from non-empirical calculations. On the other side, the experimental values by themselves do not give us the full answer either. What is necessary here is multi-dimensional interpolation or even extrapolation, and in order to be reliable this should be based on some physical model.

Thus, the only reasonable way to proceed is to switch to an inverse problem. Here the results obtained by computational chemistry are treated as qualitative. They give us a clue what the functional form may look like, but the parameters within it are assumed to be unknown and they should be determined from the experimental results available. Along this way, we can achieve the description of Nature within the experimental precision which currently is rather high and at the same time to predict the behavior of the system at the conditions where the experiments can not been carried out.

In modern material science the CALPHAD group (see papers in the journal under the same name) presents rather successful example of employing the inverse problem for numerous tasks in metallurgy, ceramics, geology, semiconductors, high-temperature superconductors and other areas. There are several integrated commercial packages available as well as public domain and shareware software (for example, see lists [1, 2]) based on the rather reliable databases of thermodynamic properites obtained during the inverse problem - a simultaneous assessment of the Gibbs energy of phases from heterogeneous experimental values (thermodynamic properties and phase diagrams).

The main stress in the current work is given to one general problem arising in the inverse problem: taking into account systematic errors which are inevitable in real experimental measurements. The practical examples of the successful use of the approach suggested are given.

The inverse problem starts with a set of experimental values obtained by different authors and by different experimental methods. For most cases, they can be expressed as follows

{

y} (1)_{ij}, x_{ij}, z_{i}

*i* enumerates heterogeneous experiments, *i* = 1, ...,
*N _{i}*,

Note that even when an experiment is multidimensional, the experimenter usually changes a single variable during a single run.

It is assumed that there is some model that can describe all the experimental results

y=_{ij}f(_{i}x,_{ij}z; Teta) +_{i}e(2)_{ij}

*f _{i}* is a relationship between input and output in the
given

Formally speaking, the task of the simultaneous assessment is to obtain a
set of unknown parameters in Eq. (2) that gives the best description of the
original experimental values. The main problem lies in question what should
be considered as the best description of the experimental points. Actually
the number of unknowns in the system (2) is always greater that the number of
equations because experimental errors e* _{ij}* are also unknown. Therefore, there
is an infinite number of solutions and which one should be taken as the best
strongly depends on our considerations of errors. The latter will be referred
to as the error model and should not be confused with the physico-chemical
model taken by itself.

The conventional approach is to employ the weighted least squares method,
*i.e.*, to find such a solution that brings the sum of squares of the
errors

SS = S_{ij}e_{ij}^{2}*W _{ij}* =
e'

to the minimum. In matrix notation e is the
vector that comprises all the errors e* _{ij}* from all the experiments (the
number of elements S

The problem that has got no clear answer in the weighted least squares is how to assess the weights. The final solution certainly depends on the values of the accepted weights and a different set of weights would lead us to the different solution. Consequently, in the weighted least squares method the task of the simultaneous assessment becomes mainly of that of the weight assignment. Mathematical statistics gives a guideline such that in order to obtain the reliable solution the weight matrix should be equal to inverse of the dispersion matrix of the error vector up to the factor

**W** = *k* **D**(**e**)^{-1}
(4)

We can proceed from Eq. (4) to the weight least squares if all the errors
e* _{ij}* are postulated to be
non-correlated (only in this case the dispersion matrix takes the diagonal
form) and ratios between variances for all the errors are known

If variances are not known the maximum likelihood method allows us to determine both unknown parameters (vector Teta) and variances simultaneously by means of maximizing the likelihood function. This permits us to drop the requirement for variance ratios to be known. Provided all the errors are described by the multinormal distribution the maximum of the likelihood function coincides with the maximum of

L = - ln {det[**D**(e)]} - e'**D**(e)^{-1}e
(5)

Note that the weighted least squares method is a special case of maximizing Eq. (5) when the variance components are known up to a constant, that is the maximum of (5) matches the minimum of (3) provided there are no other unknowns inside the dispersion matrix but the general factor (Eq. 4).

The linear error model

e=_{ij}e+_{r,ij}e+_{a,i}e(_{b,i}x-_{ij}x) (6)_{i}

*e _{ij}* is the total error,

has been recently introduced [3]. It is assumed that the total
experimental error e* _{ij}* consists
not only from the reproducibility error e

The practical reason for introducing new two terms in Eq. (6) is that the
results of the distinct experiments usually differ more between each other
than the reproducibility error in a single experiment. Formally speaking,
there is a statistically significant difference between distinct experiments,
*i.e.*, the ratio of the corresponding sum of squares is more than
Fisher's criterion allows. Then the systematic errors introduced above permit
us to treat this situation by means of formal statistical procedures.

Eq. (6) was designed for one-dimensional tasks. Fortunately, it can be applied for most practical problems because, as was mentioned above, under the conventional approach only one variable is changed within a single run. Thus all the experiments can be usually treated as pseudo-one-dimensional. As experimental material science switches to multidimensional experimental design (modern analytical chemistry appears to be doing so), the linear error model may need to be modified.

The linear error model brings forth the dispersion matrix of experimental
errors, **D**(e) in the block-diagonal form
(see details in [3]). Each block corresponds to the single experiment and its
elements are functions of three variance components

D(e* _{r,ij}*) =
s

The considerations above allow us to set up a task as follows. For the given experimental points (1), it is necessary to determine vector Teta with unknown parameters in the physico-chemical model and unknown variance components contained in the dispersion matrix simultaneously. The maximum likelihood method provides a framework to achieve this goal and also the criterion for the best solution for the system (2). The algorithm for maximizing Eq. (5) under the linear error model is described in Ref. [3].

- Combined Processing of Experimental Series with Systematic Errors - Non-Linear Physico-Chemical Model with Linear Error Model [3].
- Simultaneous Assessment of Thermodynamic Properties of Calcium Auminates[5]
- Thermodynamic Assessment of the Ba-Cu-Y System [6].
- Simultaneous assessment of YBa
_{2}Cu_{3}O_{6+z}phase thermodynamics [7].

- T.M. Gordon. Software and Data for Thermodynamics and Phase Equilibrium Calculations in Geology, http://ichor.geo.ucalgary.ca/~tmg/Research/thermo_links.html.
- C.W. Bale. Web Sites in Inorganic Chemical Thermodynamics, http://www.crct.polymtl.ca/fact/websites.htm.
- E.B. Rudnyi. Statistical model of systematic errors: linear error
model. Chemometrics and Intelligent Laboratory Systems. 1996, v. 34, N 1,
p. 41-54.

Preprint, final paper at ScienceDirect, data and code in in varcomp, also in tdlib. - Rao, C.R., Kleffe, J. Estimation of variance components and applications, North-Holand, Amsterdam, (North-Holland Series in Statistics and probability, v.3), 1988, 370 pp.
- V.V. Kuzmenko, I.A. Uspenskaya, E.B. Rudnyi. Simultaneous assessment of
thermodynamic functions of calcium aluminates. Bulletin des Societes
Chimiques Belges, 1997, v. 106, N 5, p. 235-243.

Preprint, data and code in varcomp. - E.B. Rudnyi. Statistical model of systematic errors: An assessment of
the Ba-Cu and Cu-Y phase diagram. Chemometrics and Intelligent Laboratory
Systems, 1997, v. 36, p. 213-227.

Preprint, final paper at ScienceDirect, data and code in varcomp, for Ba-Cu also in tdlib. - E.B. Rudnyi, V.V. Kuzmenko, G.V. Voronin. Simultaneous assessment of
the YBa2Cu3O6+z thermodynamics under the linear error model. J. Phys.
Chem. Ref. Data, 1988, v. 27, N 5, p. 855-888.

Preprint, final paper at AIP, data and code in varcomp.

http://Evgenii.Rudnyi.Ru