Recently, Microsoft introduced the so-called
LAMBDA function allowing users to define custom Excel functions. The concept of user-defined functions in Excel is not new. However, in the past these had to be coded in VBA first and then made available as corresponding functions in Excel in a rather tedious way. The corresponding Excel files were macro files (extension: e.g. *.xlsm). As a result, these files were often not accepted by customers from industry as their IT security system blocked the executions of macro code. With newer Excel versions (e.g. Excel 2021) you can create user-defined functions using Excel’s formula language, i.e. using the so-called
LAMBDA function. I’ve tested the
LAMBDA function with respect to its usability in the field of data analysis and I wanted to share my first experiences with you guys.
Creating a simple user-defined function
To get a bit comfortable with the
LAMBDA function, let’s have a look at a very basic example. Say we want to “encode” the four parameter logistic (4PL) function that is often used for the curve fitting of ELISA-Assays or for logistic regression:
has four input arguments: the lower asymptote , the upper asymptote , the Hill slope and the -value where 50 percent of the signal is reached . Now let us start defining the corresponding
LAMBDA function entering the following code into an Excel cell:
After hitting <Enter> you should have received a
#CALC!error warning. No worries, this is not bad per se! It just means that although your
LAMBDA function has five input arguments (x, D, A, B and C), you have not actually provided them (e.g. as concrete numbers). You could do so e.g. by adding the following inputs in brackets:
Herein, the input is an array of 6 numbers ranging from to . I entered concrete numbers here, but this could have been a range of the same numbers defined in some cells, too. The second to fifth inputs are single numbers. When you hit <Enter>, the output should be an array of the same size as . Now we make this function generally available in the Excel workbook at hand and assign it the name
FourPL. Therefore, copy this part
and go to
Formulas and open the
Name Manager. Click on
New... to assign the aforementioned
Name. You might change the
Scope (which we will not do here). Add a comment, e.g. to explain the different input arguments and the function outputs. Finally, paste the
LAMBDA function into the
Refers to field. Clicking
OK will finish the process. Close the
Name Manager again and you can try using the
FourPL function in an Excel cell. After typing
Four into a cell, Excel should already suggest the
Creating a more advanced user-defined function for linear regression
In the previous section we used the
LAMBDA function to add a 4PL function to our Excel function repertoire. Now we are going to define a user-defined function for the estimation of the coefficients of a linear regression calculation in preparation for the iteratively reweighted least-square (IRLS) regression described at the end of this post. Let’s shortly recapitulate the matrix notation of linear regression:
Herein, denotes the vector of responses (e.g. absorbance signal). denotes the design or Jacobi matrix and denotes the vector of regression coefficients. denotes the residuals (that are assumed to be normally distributed with mean 0 and variance ). Linear regression minimizes the sum of the squared residuals . After differentiation (with repect to ) and setting the resulting equation to zero, we finally end-up with the following matrix equation:
This last formula is exactly what we are going to store in a user-defined function. We’ll call it
LinReg. Let us first think about the in- and outputs to this function. The user shall input the design Matrix and the response vector .
LinRegthen returns the coefficient vector . As these in- and outputs correspond to arrays (in Excel terms) the
LAMBDAfunction contains array functions such as
To test the
LAMBDA function you may enter concrete arrays for and . For instance, say we obtained the following aborbance measurements in the lab:
|Concentration values||Absorbance values|
Then the design matrix is:
which needs to be available as a corresponding range in Excel (say
E2:F6). is simply the vector of the absorbance values and must also be available as a dedicated range in Excel (say
C2:C6). Then you can test the above
LAMBDA function by inputting these ranges:
The result vector should be
Finally, you can again go to the
Name Manager under
Formulas and assign the function name
LinReg so this function becomes available for calculation as any other conventional Excel function.
Recursive LAMBDA functions
Here is where I think that the
LAMBDA has a great potential to replace VBA-code. You can use
LAMBDA functions in a recursive manner. What does that mean? The
LAMBDA function calls itself inside the body of the function definition. That sounds a bit abstract. So let’s make a simple example from maths. The factorial of a non-negative integer is defined by
By definition, 0! is equal to 1. The factorial of 5 is . Although Excel already has a built-in function to calculate the factorial, namely
FACT, we want to create a custom function
FACTORIAL that calculates exactly the same as
FACT and thus learn what recursive
LAMBDA function actually means. Here is how you do it:
If you type this into a cell in Excel you get again the
#CALC! error that we already know. However, even if you pass 5 as input
you will not succeed and get a
#NAME? error as the expression
FACTORIAL is unknown to Excel. Before we look closer at the functional expression, you must first acquaint Excel with that
FACTORIAL function. Therefore, copy the above expression (without the input 5 in brackets) and go to
FORMULAS and to
NAME MANAGER to create a
NEW... user-defined function. Paste the expression in the
Refers to: text box and assign
Name. Now the above test expression with 5 as an input should return 120 in the corresponding cell. But how does the function actually work? Let’s use our fiver example to step through the function. When inputting into the above function, the IF-statement will first be false and thus the actually recursive part,
FACTORIAL(n-1)*n will be executed. Plugging in the concrete numbers:
FACTORIAL(4)*5 is what is calculated. In the next step the FACTORIAL function will again be passed with (first part), so that we get
FACTORIAL(3)*4 and in total
FACTORIAL(3)*4*5. The moment becomes smaller or equal to 1, the if statement becomes true and the function leaves the recursive part to return a 1 only. The IF-statment in the
LAMBDA definition is very important since otherwise function executions do not stop and Excel gets stuck in a never-ending loop. When everything works at the end, you should receive 120 for an input of 5 to the
FACTORIAL function. Comparing this to Excel’s internal function
FACT, reveals the correctness of our user-defined
Now let us come to a more practical example for recursive LAMBDA function to be used in data analysis.
Calculating the Wilcoxon-Mann-Whitney U statistic using a recursive LAMBDA function
The Mann-Whitney U-test is a famous non-parametric hypothesis test to, loosely said, compare the medians of two data groups. It is actually the non-parametric pendant to the even more famous Student t-test being used to compare the means of two data groups. Compared to the Student t-test, the Mann-Whitney U-test does not make any assumption about the underlying distribution of the data. It is not my intention to go too much into detail about this test here (this is probably going to be a saparate post). Just to mention that there is one important recursive equation derived in the original paper written by Mann and Whitney to calculate the p-value corresponding to the test statistic calculated as the minimum of and :
Herein, and denote the number of data points in group and , respectively. denotes the sum of ranks.
The recursion formula from the Mann and Whitney paper is now as follows:
then denotes the aforementioned test statistic, i.e. . In the recursion formula, and can be viewed as running indices, where and run from 0 to and , respectively. As shown in their paper, Mann and Whitney state the following “constraints”: if and also that are either zero or one depending on whether or . These constraints have caused me to split the task into two
LAMBDA functions, i.e. one to calculate the -value (called
PU) incorporating the constraints and one to represent the actual recursion formula from above (we’ll call it
Below the definition of the
Make sure to assign this
LAMBDA function the name
Name Manager in Excel. The second
MWUPVAL is then as follows:
Note that this second
LAMBDA function calls the first
PU. Make sure to assign this second
LAMBDA function the name
MWUPVAL using the
Name Manger in Excel.
Finally, you might try the
MWUPVAL function on the data set below:
|Data group A (Rank)||Data group B (Rank)|
|54 (8)||42 (4)|
|23 (1)||34 (2)|
|45 (5)||65 (9)|
|53 (7)||77 (11)|
|41 (3)||46 (6)|
I’ve written the ranks in brackets corresponding to the measurement values in the table. Herein , and , . Thus, is equal to 13. Using
MWUPVAL(13,6,5) finally calculates the corresponding (one-sided) p-value of 0.396 which is rather big indicating that the two group medians are not significantly different (based on a significance level of ).
Please note that I tried to keep the section about the Mann-Whitney U-test short as this might be detailed later in a dedicated blog post. Here I just wanted to give you a (admittedly rather complex) example for a recursive
LAMBDA function. So please don’t resent me if not everything was immediately obvious!
H. B. Mann, D. R. Whitney. “On a Test of Whether one of Two Random Variables is Stochastically Larger
than the Other.” The Annals of Mathematical Statistics, 18(1) 50-60 March, 1947.
Iteratively reweighted least-square (IRLS) using a recursive LAMBDA function
Now comes the surpreme discipline. We will combine the knowledge from the previous sections about linear regression in matrix form and about recursive
LAMBDA functions to perform an iteratively reweighted least-square regression (IRLS). IRLS is often used with generalized linear models (GLM), such as with logistic regression. A conventional least-square regression is not appropriate in these cases as the pre-requisites of conventional least-square regression are violated (e.g. constant variance of the residuals). Thus, an appropriate weighting is required. The matrix equation that we are going to implement hence contains a weighting matrix that does not have fixed weighting coefficients but coefficients that are optimized within the least-square procedure:
E.g. in practice, it can occur that the scatter increases with increasing response values (e.g. often seen with ELISA readouts). To account for that is a common weighting scheme used in these cases. Remembering that the above equation becomes recursive:
In contrast to conventional (weighted) least-square regression the above equation is calculated in an iterative manner.
Now let us implement this as a
LAMBDAfunction in Excel:
The input to this function contains the maximum number of iterations , the design matrix and denoting the initial guess vector and the vector of responses. I do not want to go into the details of this function, but again you might recognize the IF-statement to ensure that the iteration fades out when approaching 1. In practice, iterations are stopped when the -value of two successive iterations are indistinguishable to some degree of (predefined) tolerance. However, this would have made the
LAMBDA function expression even more complicated and so I simply limited the number of iterations using the parameter .
TRANSPOSE are standard Excel array functions.
WLINREG, however, is the name of the user-defined function that I assigned at the end to the above
WLINREG with the data from section 2 and with , and the same design matrix as in section 2, you should get for which is slightly different from what we obtained for the conventional (non-weighted) linear regression.
Please note that the number of recursions currently seems quite limited in Excel. E.g. when I try with the
WLINREG function, Excel is working quite hardly but returns 0 at the end. Not sure, but also the number of executable recursions seems to depend on the number of function evaluations inside the
LAMBDA function. And as our
LAMBDA function has quite a lot of functions (including all the array functions), the number of recursions seems limited.
There are currently other limitations for
LAMBDA functions. One of the most obvious ones is that its use is limited to the current Excel workbook. That means a user-defined
LAMBDA function is unfortunately not available in other Excel workbooks.
Another issue is that
LAMBDA functions are not backward compatible, e.g. if you open an Excel workbook in Excel 2016 that had been created with Excel 2021, the numbers of the
LAMBDA function might be displayed first, but when you edit the cell the corresponding cell will return an error afterwards.