Recently, Microsoft introduced the so-called LAMBDA
function allowing users to define custom Excel functions. The concept of user-defined functions in Excel is not new. However, in the past these had to be coded in VBA first and then made available as corresponding functions in Excel in a rather tedious way. The corresponding Excel files were macro files (extension: e.g. *.xlsm). As a result, these files were often not accepted by customers from industry as their IT security system blocked the executions of macro code. With newer Excel versions (e.g. Excel 2021) you can create user-defined functions using Excel’s formula language, i.e. using the so-called LAMBDA
function. I’ve tested the LAMBDA
function with respect to its usability in the field of data analysis and I wanted to share my first experiences with you guys.
Creating a simple user-defined function
To get a bit comfortable with the LAMBDA
function, let’s have a look at a very basic example. Say we want to “encode” the four parameter logistic (4PL) function that is often used for the curve fitting of ELISA-Assays or for logistic regression:
has four input arguments: the lower asymptote , the upper asymptote , the Hill slope and the -value where 50 percent of the signal is reached . Now let us start defining the corresponding LAMBDA
function entering the following code into an Excel cell:
=LAMBDA(x,D,A,B,C,D+(A-D)/(1+EXP(B*LN(x)-C)))
After hitting <Enter> you should have received a #CALC!
error warning. No worries, this is not bad per se! It just means that although your LAMBDA
function has five input arguments (x, D, A, B and C), you have not actually provided them (e.g. as concrete numbers). You could do so e.g. by adding the following inputs in brackets:
=LAMBDA(x,D,A,B,C,D+(A-D)/(1+EXP(B*LN(x)-C)))({0.000001,0.00001,0.0001,0.001,0.01,0.1},10,1,1,-4)
Herein, the input is an array of 6 numbers ranging from to . I entered concrete numbers here, but this could have been a range of the same numbers defined in some cells, too. The second to fifth inputs are single numbers. When you hit <Enter>, the output should be an array of the same size as . Now we make this function generally available in the Excel workbook at hand and assign it the name FourPL
. Therefore, copy this part
=LAMBDA(x,D,A,B,C,D+(A-D)/(1+EXP(B*LN(x)-C)))
and go to Formulas
and open the Name Manager
. Click on New...
to assign the aforementioned Name
. You might change the Scope
(which we will not do here). Add a comment, e.g. to explain the different input arguments and the function outputs. Finally, paste the LAMBDA
function into the Refers to
field. Clicking OK
will finish the process. Close the Name Manager
again and you can try using the FourPL
function in an Excel cell. After typing Four
into a cell, Excel should already suggest the FourPL
function.
Creating a more advanced user-defined function for linear regression
In the previous section we used the LAMBDA
function to add a 4PL function to our Excel function repertoire. Now we are going to define a user-defined function for the estimation of the coefficients of a linear regression calculation in preparation for the iteratively reweighted least-square (IRLS) regression described at the end of this post. Let’s shortly recapitulate the matrix notation of linear regression:
Herein, denotes the vector of responses (e.g. absorbance signal). denotes the design or Jacobi matrix and denotes the vector of regression coefficients. denotes the residuals (that are assumed to be normally distributed with mean 0 and variance ). Linear regression minimizes the sum of the squared residuals . After differentiation (with repect to ) and setting the resulting equation to zero, we finally end-up with the following matrix equation:
This last formula is exactly what we are going to store in a user-defined function. We’ll call it
LinReg
. Let us first think about the in- and outputs to this function. The user shall input the design Matrix and the response vector . LinReg
then returns the coefficient vector . As these in- and outputs correspond to arrays (in Excel terms) the LAMBDA
function contains array functions such as MMULT
and TRANSPOSE
:=LAMBDA(X,y,MMULT(MINVERSE(MMULT(TRANSPOSE(X),X)),MMULT(TRANSPOSE(X),y)))
To test the LAMBDA
function you may enter concrete arrays for and . For instance, say we obtained the following aborbance measurements in the lab:
Concentration values | Absorbance values |
0.05 | 0.572 |
0.55 | 1.241 |
1.05 | 2.215 |
1.55 | 2.175 |
2.05 | 3.308 |
Then the design matrix is:
which needs to be available as a corresponding range in Excel (say E2:F6
). is simply the vector of the absorbance values and must also be available as a dedicated range in Excel (say C2:C6
). Then you can test the above LAMBDA
function by inputting these ranges:
=LAMBDA(X,y,MMULT(MINVERSE(MMULT(TRANSPOSE(X),X)),MMULT(TRANSPOSE(X),y)))(E2:F6,C2:C6)
The result vector should be
Finally, you can again go to the Name Manager
under Formulas
and assign the function name LinReg
so this function becomes available for calculation as any other conventional Excel function.
Recursive LAMBDA functions
Here is where I think that the LAMBDA
has a great potential to replace VBA-code. You can use LAMBDA
functions in a recursive manner. What does that mean? The LAMBDA
function calls itself inside the body of the function definition. That sounds a bit abstract. So let’s make a simple example from maths. The factorial of a non-negative integer is defined by
By definition, 0! is equal to 1. The factorial of 5 is . Although Excel already has a built-in function to calculate the factorial, namely FACT
, we want to create a custom function FACTORIAL
that calculates exactly the same as FACT
and thus learn what recursive LAMBDA
function actually means. Here is how you do it:
=LAMBDA(n,IF(n<=1,1,FACTORIAL(n-1)*n))
If you type this into a cell in Excel you get again the #CALC!
error that we already know. However, even if you pass 5 as input
=LAMBDA(n,IF(n<=1,1,FACTORIAL(n-1)*n))(5)
you will not succeed and get a #NAME?
error as the expression FACTORIAL
is unknown to Excel. Before we look closer at the functional expression, you must first acquaint Excel with that FACTORIAL
function. Therefore, copy the above expression (without the input 5 in brackets) and go to FORMULAS
and to NAME MANAGER
to create a NEW...
user-defined function. Paste the expression in the Refers to:
text box and assign FACTORIAL
as Name
. Now the above test expression with 5 as an input should return 120 in the corresponding cell. But how does the function actually work? Let’s use our fiver example to step through the function. When inputting into the above function, the IF-statement will first be false and thus the actually recursive part, FACTORIAL(n-1)*n
will be executed. Plugging in the concrete numbers: FACTORIAL(4)*5
is what is calculated. In the next step the FACTORIAL function will again be passed with (first part), so that we get FACTORIAL(3)*4
and in total FACTORIAL(3)*4*5
. The moment becomes smaller or equal to 1, the if statement becomes true and the function leaves the recursive part to return a 1 only. The IF-statment in the LAMBDA
definition is very important since otherwise function executions do not stop and Excel gets stuck in a never-ending loop. When everything works at the end, you should receive 120 for an input of 5 to the FACTORIAL
function. Comparing this to Excel’s internal function FACT
, reveals the correctness of our user-defined FACTORIAL
function.
Now let us come to a more practical example for recursive LAMBDA function to be used in data analysis.
Calculating the Wilcoxon-Mann-Whitney U statistic using a recursive LAMBDA function
The Mann-Whitney U-test is a famous non-parametric hypothesis test to, loosely said, compare the medians of two data groups. It is actually the non-parametric pendant to the even more famous Student t-test being used to compare the means of two data groups. Compared to the Student t-test, the Mann-Whitney U-test does not make any assumption about the underlying distribution of the data. It is not my intention to go too much into detail about this test here (this is probably going to be a saparate post). Just to mention that there is one important recursive equation derived in the original paper written by Mann and Whitney to calculate the p-value corresponding to the test statistic calculated as the minimum of and :
Herein, and denote the number of data points in group and , respectively. denotes the sum of ranks.
The recursion formula from the Mann and Whitney paper is now as follows:
then denotes the aforementioned test statistic, i.e. . In the recursion formula, and can be viewed as running indices, where and run from 0 to and , respectively. As shown in their paper, Mann and Whitney state the following “constraints”: if and also that are either zero or one depending on whether or . These constraints have caused me to split the task into two LAMBDA
functions, i.e. one to calculate the -value (called PU
) incorporating the constraints and one to represent the actual recursion formula from above (we’ll call it MWUPVAL
).
Below the definition of the LAMBDA
function PU
:
=LAMBDA(u,na,nb,IF(OR(u<0,u>na*nb),0,IF(OR(na=0,nb=0),IF(u<>0,0,1),IF(AND(na>0,nb>0),(na*PU(u,na-1,nb)+nb*PU(u-na,na,nb-1))/(na+nb)))))
Make sure to assign this LAMBDA
function the name PU
using Name Manager
in Excel. The second LAMBDA
function MWUPVAL
is then as follows:
=LAMBDA(u,na,nb,IF(u<0,0,MWUPVAL(u-1,na,nb)+PU(u,na,nb)))
Note that this second LAMBDA
function calls the first LAMBDA
function PU
. Make sure to assign this second LAMBDA
function the name MWUPVAL
using the Name Manger
in Excel.
Finally, you might try the MWUPVAL
function on the data set below:
Data group A (Rank) | Data group B (Rank) |
54 (8) | 42 (4) |
23 (1) | 34 (2) |
45 (5) | 65 (9) |
53 (7) | 77 (11) |
41 (3) | 46 (6) |
73 (10) |
I’ve written the ranks in brackets corresponding to the measurement values in the table. Herein , and , . Thus, is equal to 13. Using MWUPVAL(13,6,5)
finally calculates the corresponding (one-sided) p-value of 0.396 which is rather big indicating that the two group medians are not significantly different (based on a significance level of ).
Please note that I tried to keep the section about the Mann-Whitney U-test short as this might be detailed later in a dedicated blog post. Here I just wanted to give you a (admittedly rather complex) example for a recursive LAMBDA
function. So please don’t resent me if not everything was immediately obvious!
Reference
H. B. Mann, D. R. Whitney. “On a Test of Whether one of Two Random Variables is Stochastically Larger
than the Other.” The Annals of Mathematical Statistics, 18(1) 50-60 March, 1947.
https://doi.org/10.1214/aoms/1177730491
Iteratively reweighted least-square (IRLS) using a recursive LAMBDA function
Now comes the surpreme discipline. We will combine the knowledge from the previous sections about linear regression in matrix form and about recursive LAMBDA
functions to perform an iteratively reweighted least-square regression (IRLS). IRLS is often used with generalized linear models (GLM), such as with logistic regression. A conventional least-square regression is not appropriate in these cases as the pre-requisites of conventional least-square regression are violated (e.g. constant variance of the residuals). Thus, an appropriate weighting is required. The matrix equation that we are going to implement hence contains a weighting matrix that does not have fixed weighting coefficients but coefficients that are optimized within the least-square procedure:
E.g. in practice, it can occur that the scatter increases with increasing response values (e.g. often seen with ELISA readouts). To account for that is a common weighting scheme used in these cases. Remembering that the above equation becomes recursive:
In contrast to conventional weighted least-square regression the above equation is calculated in an iterative manner.
Now let us implement this as a
LAMBDA
function in Excel:=LAMBDA(i,X,b,y,IF(i<=1,b,MMULT(MINVERSE(MMULT(TRANSPOSE(X),MMULT(MUNIT(COUNT(y))1/(MMULT(X,WLINREG(i-1,X,b;y)))^2,X))),MMULT(TRANSPOSE(X),MMULT(MUNIT(COUNT(y))1/(MMULT(X,WLINREG(i-1,X,b,y)))^2,y)))))
The input to this function contains the maximum number of iterations , the design matrix and denoting the initial guess vector and the vector of responses. I do not want to go into the details of this function, but again you might recognize the IF-statement to ensure that the iteration fades out when approaching 1. In practice, iterations are stopped when the -value of two successive iterations are indistinguishable to some degree of (predefined) tolerance. However, this would have made the LAMBDA
function expression even more complicated and so I simply limited the number of iterations using the parameter . MINVERSE
, MMULT
, MUNIT
and TRANSPOSE
are standard Excel array functions. WLINREG
, however, is the name of the user-defined function that I assigned at the end to the above LAMBDA
-function. Using WLINREG
with the data from section 2 and with , and the same design matrix as in section 2, you should get for which is slightly different from what we obtained for the conventional (non-weighted) linear regression.
Please note that the number of recursions currently seems quite limited in Excel. E.g. when I try with the WLINREG
function, Excel is working quite hardly but returns 0 at the end. Not sure, but also the number of executable recursions seems to depend on the number of function evaluations inside the LAMBDA
function. And as our LAMBDA
function has quite a lot of functions (including all the array functions), the number of recursions seems limited.
There are currently other limitations for LAMBDA
functions. One of the most obvious ones is that its use is limited to the current Excel workbook. That means a user-defined LAMBDA
function is unfortunately not available in other Excel workbooks.
Another issue is that LAMBDA
functions are not backward compatible, e.g. if you open an Excel workbook in Excel 2016 that had been created with Excel 2021, the numbers of the LAMBDA
function might be displayed first, but when you edit the cell the corresponding cell will return an error afterwards.