How to use Excel’s LAMBDA Function for Data Analysis?

Recently, Microsoft introduced the so-called LAMBDA function allowing users to define custom Excel functions. The concept of user-defined functions in Excel is not new. However, in the past these had to be coded in VBA first and then made available as corresponding functions in Excel in a rather tedious way. The corresponding Excel files were macro files (extension: e.g. *.xlsm). As a result, these files were often not accepted by customers from industry as their IT security system blocked the executions of macro code. With newer Excel versions (e.g. Excel 2021) you can create user-defined functions using Excel’s formula language, i.e. using the so-called LAMBDA function. I’ve tested the LAMBDA function with respect to its usability in the field of data analysis and I wanted to share my first experiences with you guys.

Creating a simple user-defined function

To get a bit comfortable with the LAMBDA function, let’s have a look at a very basic example. Say we want to “encode” the four parameter logistic (4PL) function that is often used for the curve fitting of ELISA-Assays or for logistic regression:

    \[f(x) = D + \frac{A-D}{1+\exp\left(B \log(x) - C \right)}\]

f(x) has four input arguments: the lower asymptote D, the upper asymptote A, the Hill slope B and the x-value where 50 percent of the signal is reached C. Now let us start defining the corresponding LAMBDA function entering the following code into an Excel cell:

=LAMBDA(x,D,A,B,C,D+(A-D)/(1+EXP(B*LN(x)-C)))

After hitting <Enter> you should have received a #CALC!error warning. No worries, this is not bad per se! It just means that although your LAMBDA function has five input arguments (x, D, A, B and C), you have not actually provided them (e.g. as concrete numbers). You could do so e.g. by adding the following inputs in brackets:

=LAMBDA(x,D,A,B,C,D+(A-D)/(1+EXP(B*LN(x)-C)))({0.000001,0.00001,0.0001,0.001,0.01,0.1},10,1,1,-4)

Herein, the input x is an array of 6 numbers ranging from 10^{-6} to 10^{-1}. I entered concrete numbers here, but this could have been a range of the same numbers defined in some cells, too. The second to fifth inputs are single numbers. When you hit <Enter>, the output should be an array of the same size as x. Now we make this function generally available in the Excel workbook at hand and assign it the name FourPL. Therefore, copy this part

=LAMBDA(x,D,A,B,C,D+(A-D)/(1+EXP(B*LN(x)-C)))

and go to Formulas and open the Name Manager. Click on New... to assign the aforementioned Name. You might change the Scope (which we will not do here). Add a comment, e.g. to explain the different input arguments and the function outputs. Finally, paste the LAMBDA function into the Refers to field. Clicking OK will finish the process. Close the Name Manager again and you can try using the FourPL function in an Excel cell. After typing Four into a cell, Excel should already suggest the FourPL function.

Creating a more advanced user-defined function for linear regression

In the previous section we used the LAMBDA function to add a 4PL function to our Excel function repertoire. Now we are going to define a user-defined function for the estimation of the coefficients of a linear regression calculation in preparation for the iteratively reweighted least-square (IRLS) regression described at the end of this post. Let’s shortly recapitulate the matrix notation of linear regression:

    \[\mathbf{y} = \mathbf{X}\mathbf{b} + \mathbf{\epsilon}\]

Herein, \mathbf{y} denotes the vector of responses (e.g. absorbance signal). \mathbf{X} denotes the design or Jacobi matrix and \mathbf{b} denotes the vector of regression coefficients. \mathbf{\epsilon} denotes the residuals (that are assumed to be normally distributed with mean 0 and variance \sigma^2). Linear regression minimizes the sum of the squared residuals \sum_i \epsilon_i^2 = \epsilon^T \epsilon = \left( \mathbf{y} - \mathbf{X} \mathbf{b}\right)^T \left( \mathbf{y} - \mathbf{X} \mathbf{b}\right). After differentiation (with repect to \mathbf{b}) and setting the resulting equation to zero, we finally end-up with the following matrix equation:

    \[\mathbf{b} = \left( \mathbf{X} ^T \mathbf{X}  \right)^{-1} \mathbf{X} ^T \mathbf{y} \]


This last formula is exactly what we are going to store in a user-defined function. We’ll call it LinReg. Let us first think about the in- and outputs to this function. The user shall input the design Matrix \mathbf{X} and the response vector \mathbf{y}. LinReg then returns the coefficient vector \mathbf{b}. As these in- and outputs correspond to arrays (in Excel terms) the LAMBDA function contains array functions such as MMULT and TRANSPOSE:

=LAMBDA(X,y,MMULT(MINVERSE(MMULT(TRANSPOSE(X),X)),MMULT(TRANSPOSE(X),y)))

To test the LAMBDA function you may enter concrete arrays for \mathbf{X} and \mathbf{y}. For instance, say we obtained the following aborbance measurements in the lab:

Concentration values xAbsorbance values y
0.050.572
0.551.241
1.052.215
1.552.175
2.053.308

Then the design matrix is:

    \[\mathbf{X} = \begin{bmatrix}1 & 0.05\\1 & 0.55\\1 & 1.05\\1 & 1.55\\1 & 2.05\end{bmatrix}\]

which needs to be available as a corresponding range in Excel (say E2:F6). \mathbf{y} is simply the vector of the absorbance values and must also be available as a dedicated range in Excel (say C2:C6). Then you can test the above LAMBDA function by inputting these ranges:

=LAMBDA(X,y,MMULT(MINVERSE(MMULT(TRANSPOSE(X),X)),MMULT(TRANSPOSE(X),y)))(E2:F6,C2:C6)

The result vector \mathbf{b} should be [0.5572, 1.2808]

Finally, you can again go to the Name Manager under Formulas and assign the function name LinReg so this function becomes available for calculation as any other conventional Excel function.

Recursive LAMBDA functions

Here is where I think that the LAMBDA has a great potential to replace VBA-code. You can use LAMBDA functions in a recursive manner. What does that mean? The LAMBDA function calls itself inside the body of the function definition. That sounds a bit abstract. So let’s make a simple example from maths. The factorial of a non-negative integer n is defined by

    \[n! = n \cdot (n-1) \cdot (n-2) \cdot \dots 2 \cdot 1 = n \cdot (n-1)!\]

By definition, 0! is equal to 1. The factorial of 5 is 5! = 120. Although Excel already has a built-in function to calculate the factorial, namely FACT, we want to create a custom function FACTORIAL that calculates exactly the same as FACT and thus learn what recursive LAMBDA function actually means. Here is how you do it:

=LAMBDA(n,IF(n<=1,1,FACTORIAL(n-1)*n))

If you type this into a cell in Excel you get again the #CALC! error that we already know. However, even if you pass 5 as input

=LAMBDA(n,IF(n<=1,1,FACTORIAL(n-1)*n))(5)

you will not succeed and get a #NAME? error as the expression FACTORIAL is unknown to Excel. Before we look closer at the functional expression, you must first acquaint Excel with that FACTORIAL function. Therefore, copy the above expression (without the input 5 in brackets) and go to FORMULAS and to NAME MANAGER to create a NEW... user-defined function. Paste the expression in the Refers to: text box and assign FACTORIAL as Name. Now the above test expression with 5 as an input should return 120 in the corresponding cell. But how does the function actually work? Let’s use our fiver example to step through the function. When inputting n=5 into the above function, the IF-statement will first be false and thus the actually recursive part, FACTORIAL(n-1)*n will be executed. Plugging in the concrete numbers: FACTORIAL(4)*5 is what is calculated. In the next step the FACTORIAL function will again be passed with n=4 (first part), so that we get FACTORIAL(3)*4 and in total FACTORIAL(3)*4*5. The moment n becomes smaller or equal to 1, the if statement becomes true and the function leaves the recursive part to return a 1 only. The IF-statment in the LAMBDA definition is very important since otherwise function executions do not stop and Excel gets stuck in a never-ending loop. When everything works at the end, you should receive 120 for an input of 5 to the FACTORIAL function. Comparing this to Excel’s internal function FACT, reveals the correctness of our user-defined FACTORIAL function.
Now let us come to a more practical example for recursive LAMBDA function to be used in data analysis.

Calculating the Wilcoxon-Mann-Whitney U statistic using a recursive LAMBDA function

The Mann-Whitney U-test is a famous non-parametric hypothesis test to, loosely said, compare the medians of two data groups. It is actually the non-parametric pendant to the even more famous Student t-test being used to compare the means of two data groups. Compared to the Student t-test, the Mann-Whitney U-test does not make any assumption about the underlying distribution of the data. It is not my intention to go too much into detail about this test here (this is probably going to be a saparate post). Just to mention that there is one important recursive equation derived in the original paper written by Mann and Whitney to calculate the p-value corresponding to the test statistic U calculated as the minimum of U_A and U_B:

    \begin{align*}U_A &= n_An_B+\frac{n_A(n_A+1)}{2}-\sum\limits_{i=1}^{n_A} R_i \\U_B &= n_An_B+\frac{n_B(n_B+1)}{2}-\sum\limits_{i=1}^{n_B} R_i \end{align}

Herein, n_A and n_B denote the number of data points in group A and B, respectively. \sum_{i} R_i denotes the sum of ranks.

The recursion formula from the Mann and Whitney paper is now as follows:

    \[p_{n_A,n_B}(U)=\frac{n_A}{n_A+n_B}p_{n_A-1,n_B}(U-n_B)+\frac{n_B}{n_A+n_B}p_{n_A,n_B-1}(U)\]

U then denotes the aforementioned test statistic, i.e. U = \min(U_A, U_B). In the recursion formula, n_A and n_B can be viewed as running indices, p_{i,j} where i and j run from 0 to n_A and n_B, respectively. As shown in their paper, Mann and Whitney state the following “constraints”: p_{i,j}(U)=0 if U<0 and also that p_{i,0}(U), p_{0,j}(U) are either zero or one depending on whether U \neq 0 or U=0. These constraints have caused me to split the task into two LAMBDA functions, i.e. one to calculate the p(U)-value (called PU) incorporating the constraints and one to represent the actual recursion formula from above (we’ll call it MWUPVAL).

Below the definition of the LAMBDA function PU:

=LAMBDA(u,na,nb,IF(OR(u<0,u>na*nb),0,IF(OR(na=0,nb=0),IF(u<>0,0,1),IF(AND(na>0,nb>0),(na*PU(u,na-1,nb)+nb*PU(u-na,na,nb-1))/(na+nb)))))

Make sure to assign this LAMBDA function the name PU using Name Manager in Excel. The second LAMBDA function MWUPVAL is then as follows:

=LAMBDA(u,na,nb,IF(u<0,0,MWUPVAL(u-1,na,nb)+PU(u,na,nb)))

Note that this second LAMBDA function calls the first LAMBDA function PU. Make sure to assign this second LAMBDA function the name MWUPVAL using the Name Manger in Excel.

Finally, you might try the MWUPVAL function on the data set below:

Data group A (Rank)Data group B (Rank)
54 (8)42 (4)
23 (1)34 (2)
45 (5)65 (9)
53 (7)77 (11)
41 (3)46 (6)
73 (10)

I’ve written the ranks in brackets corresponding to the measurement values in the table. Herein n_A=6, n_B=5 and U_A=17, U_B=13. Thus, U = \min(U_A, U_B) is equal to 13. Using MWUPVAL(13,6,5) finally calculates the corresponding (one-sided) p-value of 0.396 which is rather big indicating that the two group medians are not significantly different (based on a significance level of \alpha=0.05).


Please note that I tried to keep the section about the Mann-Whitney U-test short as this might be detailed later in a dedicated blog post. Here I just wanted to give you a (admittedly rather complex) example for a recursive LAMBDA function. So please don’t resent me if not everything was immediately obvious!
Reference
H. B. Mann, D. R. Whitney. “On a Test of Whether one of Two Random Variables is Stochastically Larger
than the Other.” The Annals of Mathematical Statistics, 18(1) 50-60 March, 1947.
https://doi.org/10.1214/aoms/1177730491

Iteratively reweighted least-square (IRLS) using a recursive LAMBDA function

Now comes the surpreme discipline. We will combine the knowledge from the previous sections about linear regression in matrix form and about recursive LAMBDA functions to perform an iteratively reweighted least-square regression (IRLS). IRLS is often used with generalized linear models (GLM), such as with logistic regression. A conventional least-square regression is not appropriate in these cases as the pre-requisites of conventional least-square regression are violated (e.g. constant variance of the residuals). Thus, an appropriate weighting is required. The matrix equation that we are going to implement hence contains a weighting matrix \mathbf{W} that does not have fixed weighting coefficients but coefficients that are optimized within the least-square procedure:

    \[\mathbf{b} = \left( \mathbf{X} ^T \mathbf{W}  \mathbf{X}  \right)^{-1} \mathbf{X} ^T \mathbf{W} \mathbf{y} \]

E.g. in practice, it can occur that the scatter increases with increasing response values (e.g. often seen with ELISA readouts). To account for that \mathbf{W}=1/y^2 = y^{-2} is a common weighting scheme used in these cases. Remembering that \mathbf{y} = \mathbf{X}\mathbf{b} the above equation becomes recursive:

    \[\mathbf{b}^{(i)} = \left( \mathbf{X} ^T \left(\mathbf{X}\mathbf{b}^{(i-1)} \right)^{-2}  \mathbf{X}  \right)^{-1} \mathbf{X} ^T \left(\mathbf{X}\mathbf{b}^{(i-1)} \right)^{-2} \mathbf{y} \]


In contrast to conventional (weighted) least-square regression the above equation is calculated in an iterative manner.
Now let us implement this as a LAMBDA function in Excel:

=LAMBDA(i,X,b,y,IF(i<=1,b,MMULT(MINVERSE(MMULT(TRANSPOSE(X),MMULT(MUNIT(COUNT(y))1/(MMULT(X,WLINREG(i-1,X,b;y)))^2,X))),MMULT(TRANSPOSE(X),MMULT(MUNIT(COUNT(y))1/(MMULT(X,WLINREG(i-1,X,b,y)))^2,y)))))

The input to this function contains the maximum number of iterations i, the design matrix \mathbf{X} and \mathbf{b} denoting the initial guess vector and \mathbf{y} the vector of responses. I do not want to go into the details of this function, but again you might recognize the IF-statement to ensure that the iteration fades out when approaching 1. In practice, iterations are stopped when the b-value of two successive iterations are indistinguishable to some degree of (predefined) tolerance. However, this would have made the LAMBDA function expression even more complicated and so I simply limited the number of iterations using the parameter i. MINVERSE, MMULT, MUNIT and TRANSPOSE are standard Excel array functions. WLINREG, however, is the name of the user-defined function that I assigned at the end to the above LAMBDA-function. Using WLINREG with the data from section 2 and with i=10, \mathbf{b}=[1, 1] and the same design matrix \mathbf{X} as in section 2, you should get [0.513, 1.334] for \mathbf{b} which is slightly different from what we obtained for the conventional (non-weighted) linear regression.

Please note that the number of recursions currently seems quite limited in Excel. E.g. when I try i=30 with the WLINREG function, Excel is working quite hardly but returns 0 at the end. Not sure, but also the number of executable recursions seems to depend on the number of function evaluations inside the LAMBDA function. And as our LAMBDA function has quite a lot of functions (including all the array functions), the number of recursions seems limited.
There are currently other limitations for LAMBDA functions. One of the most obvious ones is that its use is limited to the current Excel workbook. That means a user-defined LAMBDA function is unfortunately not available in other Excel workbooks.
Another issue is that LAMBDA functions are not backward compatible, e.g. if you open an Excel workbook in Excel 2016 that had been created with Excel 2021, the numbers of the LAMBDA function might be displayed first, but when you edit the cell the corresponding cell will return an error afterwards.

Leave a Reply