Curve fitting (command line)

Curve fitting is an integral part of quantitative analysis. Usually the experimental data is fitted to a predictive analytical model. Here an example code has been explained to fit CPMG relaxation dispersion NMR data using Richard Craver model. The codes are written in both matlab as well as C language.

Matlab:

In the attached downloadable matlab zip file the following files are present:

1) Curve_fit.m
        In this file please set the parameter
            param_search=1;             % If exhaustive search for parameters is needed; takes long time but gives excellent fit
            or
            param_search=0;             % for quick fit,sometimes fitting may fail
    
    
        criteria_res_norm=1e-2;     % Minimum squared sum of residuals, lower the value more stringent the fitting will be and yields                                                                                                % excellent results

2) model_finc.m

        In this file please define the model function to be fitted. E.g if you want to fit to a linear equation Y=mX+c, we have two parameters m,c to be fitted. So, the         content of this file should be,

        function z=model_func(a,x);
        z=a(1).*x+a(2);
        or using explicit loop
        function z=model_func(a,x);
        [row,col]=size(x);
        for i=1:row
        z(i,1)=a(1)*x(i)+a(2);
        end


        note that in the first definition of  z=a(1).*x+a(2); we have used .* for element wise multiplication of vector elements.

        In the attachment file the "Richard Craver" model was defined.

3) Data:
        This file contains the x and y values as first and second column. Example content of Data file is:
        0    21.8783973163
        1    32.6781264109
        2    51.4573515575
        3    90.2250441136
        4    115.6549521303
        5    188.5256850821
        6    218.6186036507
        7    268.4985192822
        8    294.7800862444
        9    314.6647993941
        10    287.9013078371
        11    277.8269737847
        12    327.831676702
        13    297.2861193764
        14    312.7020443178


        The first column is X value, the second column is Y value.

4) Limits:
        This file contains the lower and upper limit of the fitting parameter. The number of rows corresponds to the number of fit parameters in the model function. For the above example of linear line fit, we have defined the model function as z=a(1).*x+a(2). The first and the second parameter are a(1) and a(2) which corresponds to 'm' and 'c' in the analytical expression. So the content of the limit files are as follows,

        -10000    10000    0    5
        -20000    20000    0    4

        The first row  -10000    10000    0    5  corresponds to paramater 'm'. The first column -10000 is the lower limit, 10000 is the upper limit within which the algorithm will search for the best value of 'm'. the third column '0' specifies that the fit value is either positive or negative. '1' is for only poisitive, '-1' is for only negative value. The fourth column 5 specifies that the  algorithm should split the interval -10000 to 10000 in to 5 geometric progression based partition (say, -10000, -100, 0, 100, 10000). lesser the magnitude of this value the faster the program completes, but may fail to converge to best solution. Larger  the value for this entry longer the program runs but ensures good results.

5) Results_out:
        This is the name of the results of the fitting with standard error and the back calculated value of Y, along with the confidence interval. Example fit result for 4 CPMG parameters are as follows:

        The fitted parameters with standard error and confidence interval at 95% level (alpha=5%)
        __________________________________________________________________________

        -10.000353             +/-    0.011333    [-10.023791 to -9.973288]
        -2591.00502           +/-    1.836048    [-2595.399443 to -2587.217504]
         0.502156               +/-    0.005422    [0.487923 to 0.512086]
         12.999901             +/-    0.010593    [12.974459 to 13.021662]



        The fitted data and its confidence interval at 95% level (alpha=5%)
        _________________________________________________________
    
        x_expt     y_exp    y_pred    [y_ci_lower    y_ci_upper]
        ------------------------------------------------------------------------------

        20.000000    7.870000    7.871026 [7.857646 to 7.884407]
        32.800000    7.800000    7.800339 [7.786799 to 7.813879]
        65.600000    7.780000    7.778339 [7.763084 to 7.793593]
        131.200000    7.010000    7.011901 [7.005356 to 7.018447]
        262.400000    6.740000    6.736953 [6.731555 to 6.742352]
        328.000000    5.670000    5.671648 [5.667968 to 5.675328]
        393.600000    4.970000    4.969932 [4.967188 to 4.972676]
        459.200000    4.500000    4.500416 [4.497963 to 4.502869]
        524.800000    4.180000    4.175870 [4.173345 to 4.178394]
        590.400000    3.940000    3.944011 [3.941288 to 3.946734]
        656.000000    3.770000    3.773382 [3.770445 to 3.776319]
        721.600000    3.640000    3.644522 [3.641391 to 3.647654]
        787.200000    3.550000    3.545007 [3.541709 to 3.548306]
        852.800000    3.470000    3.466652 [3.463213 to 3.470090]



C Language:

If you want to perform Curve fit using a c program, you can try Curve_fit_C.zip (see the bottom of this page). This program requires GSL scientific library (>=1.16) for its compilation. Please look in to the README file for how to compile this program.

The attached file contains the source file:
        LVM_primer.c
        and
        LVM_algorithm_modified.c
   
        Please change the parameters:
        #define Trial_max 100               // Increase for deeper search but will take more time
        #define Geometric_search 2      // 0) Quick random search uses CHI_SQUARE_LIMIT & Trial_max
                                                       // 1) Complete random search tries Trial_max times and chooses the best of lowest chi-square
                                                       // 2) Completely organized; chooses initial parameters based on geometric progression and outputs                                                                 best chi-squared values.

        
        #define CHI_SQUARE_LIMIT 0.00001    // Lower the chi-square the better the fitting will be but may take long time. If fitting                                                                                                                    // does'nt occur give a higher value like 0.1
        #define Iter_max 10000                          // Higher this value better the fit will be but it may take long time.


    Modify the model function func_calc


            int func_calc(double *param, void *calc_data)
                    {
                        int n = ((Data_nongsl *)calc_data)->n;
                        int p = ((Data_nongsl *)calc_data)->p;
                        double *x_data = ((Data_nongsl *)calc_data)->x_data;
                        int i,j;


                        for(j=0;j<p;j++){if(param[j]==0){param[j]=pow(10,-10);}}



                        for(i=0;i<n;i++)
                                {

                                    ((Data_nongsl *)calc_data)->y_calc_data[i]= param[0]+param[1]*exp(-1*param[2]*x_data[i]);

                                 }

                    }



                    Change only the line highlighted in blue. In the above case the model function is an exponential: Y=A+B*exp(-1*C*X);
                    for a linear fit the above line would be:

                                                ((Data_nongsl *)calc_data)->y_calc_data[i]= param[0]+param[1]*x_data[i];

                    note that param[0] corresponds to 'c' and param[1] corresponds to 'm'. If the model includes many fit parameters, use param[0] to param[n], where n is the index for the last parameter. Accordingly, the number of rows in the Limit file should match with the number of parameters present in the model (see below for the preparation of input files).

Compile the program LVM_primer.c as follows (no need to compile LVM_algorithm_modified.c as this file will be compiled automatically when LVM_primer.c is compiled),

        gcc -o LVM_primer LVM_primer.c -lm /home/Software/GSL/lib/libgsl.a /home/Software/GSL/lib/libgslcblas.a

Run the program as

      ./
LVM_primer

This program requires two input files: Data, Limit. Prepare the input files are explained above for matlab curve fit program or see the example files present in the attached file. The model considered here is Richard Craver model for CPMG relaxation dispersion.


Fit_results and Raw_results are the output file. Raw_results contains all the fits including the ones that are bad. Fit_results is the best of all the fits present in Raw_results.

(Please click the download arrow to download the files correctly )

ċ
Curve_fit_C.zip
(15k)
Janarthanan Krishnamoorthy,
Jul 11, 2015, 5:24 AM
ċ
Curve_fit_matlab.zip
(4k)
Janarthanan Krishnamoorthy,
Jul 11, 2015, 5:12 AM
Comments