In this section we introduce a simple decision tree, designed to provide guidance in answering the question whether a given problem is solvable via conic optimization. Note, due to the complexity of the topic and the manifold of special cases, the decision tree can only provide guidance and not provide a definitive answer. For a general introduction into conic optimization see e.g., Boyd and Vandenberghe (2004) or Ben-Tal and Nemirovski (2001).
A general way to determine if a given optimization problem is solvable via conic optimization is illustrated in Figure 0.1. More specifically, an optimization problem is solvable via conic optimization if it is convex and if the objective and constraints can be represented using convex cones. In all other cases, the problem is in principle not solvable, however exceptions exist. Additional explanations to the decision tree are given in the subsection below.
Another rather straight-forward way to verify that a problem is solvable via conic optimization is to enter the problem into one of the CVX (Grant and Boyd 2014) domain specific language implementations. CVX uses disciplined convex programming (DCP) (Grant 2004) to verify that a given problem is convex and conic representable. Note DCP works with the rule: “if DCP compliant than solvable via convex solvers relation”. However, we encountered several problems which are known to be convex and representable via conic optimization but not DCP compliant, e.g., log-binomial regression. Regarding the CVX implementations, cvxpy (Diamond and Boyd 2016) for Python is the main focus of the Stanford University Convex Optimization Group (cvxgrp). For R there exists the CVXR package (Fu, Narasimhan, and Boyd 2020).
A problem is called conic representable if there exists a cone that can express the problem. State of the art solvers distinguish between up to eight different types of cones including linear, second-order, positive semidefinite, power and exponential cones. Therefore, if your problem is convex and comprised of linear, quadratic, power, exponential or log terms there is a good chance that it can be expressed via conic optimization. Nevertheless, note that this need not be the case for any problem containing linear, quadratic, power, exponential or log terms (an example is presented in 4.4).
The definition of the cones can be found in e.g., Diamond and Boyd (2015) or Theußl, Schwendinger, and Hornik (2020). To provide more guidance on this topic we will discuss the representability of several GLMs in Section 4.
In general conic optimization solvers are designed to solve convex optimization problems. However, for some non-convex problems there exist convex relaxations which make it possible again to solve these problems via conic optimization.
One such exception are mixed integer optimization problems, which are non-convex. However, if the problem without the mixed integer constraint can be solved via conic optimization also the problem with mixed integer constraint can be solved via conic optimization. This is possible since the sub-problems solved within the branch and bound algorithm are again convex.
If the problem can not be expressed directly by the cones supported by the solvers, there sometimes exist equivalent problems or approximations which can be solved via conic optimization. An example for such an exception is the binomial GLM with probit link (see 4.3).
It is well known that logistic regression is a convex optimization problem, where the MLE is finite and unique when the data is not separated. More information about separation detection can be found in e.g., Kosmidis and Schumacher (2020). The MLE can be obtained maximizing the log-likelihood function, \[ \underset{\beta}{\text{maximize}} \sum_{i \in I} X_{i*} \beta - \sum_{k \in I \cup J} \log(1 + \exp(X_{k*} \beta)) ~~ \text{where} ~~ I = \{i| y_i = 1\}, J = \{j| y_j = 0\}. \] Knowing that the problem is convex and looking at the log-likelihood function one sees that the objective is comprised sums of linear terms, and logarithms and exponential functions. From this information one can infer that there is a good chance that the problem is representable by making use of the exponential cone. More information on modeling problems with the exponential cone can be found in Theußl, Schwendinger, and Hornik (2020), ROI-homepage and the MOSEK Modeling Cookbook.
The log-binomial regression model (LBRM) is known to be convex, \[ \begin{array}{rl} \underset{\beta}{\textrm{maximize}} & \displaystyle\sum_{i = 1}^n y_i ~ X_{i*} \beta + (1 - y_i) ~ \log(1 - \exp(X_{i*} \beta)) \\ \textrm{subject to} & X \beta \leq 0. \end{array} \] similar to logistic regression the finiteness of the MLE depends on the separation of the data. However, due to the linear constraint \(X \beta \leq 0\) the MLE of the log-binomial regression model exists also for cases where the MLE of the logistic regression model does not exist. Schwendinger, Grün, and Hornik (2021) point out the existence of the MLE of the log-binomial regression model can be verified by solving a linear optimization problem.
Given the MLE exists, the LBRM can be solved by making use of the linear and exponential cone. The linear cone is needed for the \(X \beta \leq 0\) constraint.
The MLE of the probit model can be obtained by, \[ \underset{\beta}{\text{maximize}} ~~ \sum_{i = 1}^n y_i ~ \log(\Phi(X_{i*} \beta)) + \sum_{i = 1}^n (1 - y_i) \log(1 - \Phi(X_{i*} \beta)). \]
Since there exist no cone to express the cumulative normal \(\Phi\), the problem can not be solved with conic solvers. Page (1977) gives a simple approximation \[ \Phi(x) = \frac{\exp \left( 2 \sqrt{\frac{\pi}{2}} ~ x \right)}{1 + \exp \left( 2 \sqrt{\frac{\pi}{2}} ~ x \right)}. \] Using this approximation the probit model can be estimated via logistic regression where either the data or the estimate are re-scaled by \(2 \sqrt{\frac{\pi}{2}}\). There exist many approximations for the cumulative normal and most of them are more accurate than this approximation. However, this approximation has the advantage that it can be expressed via conic optimization.
The complementary log-log (cloglog) model is known to be convex and the MLE is finite and unique if the data is not separated (see e.g., Silvapulle (1981) or Kaufmann (1988)).
\[ \underset{\beta}{\text{maximize}} ~~ \sum^n_{i = 1} y_i ~ \log(1 - \exp(-\exp(X_{i*} \beta))) - (1 - y_i) ~ \exp(X_{i*} \beta) \]
Since the problem is convex and the log-likelihood is only comprised of \(\log\) and \(\exp\) terms, the problem could be representable via the exponential cone. However, we did not find a problem formulation that worked as expected. We speculate that since we have to put an exponential cone into another exponential cone the problem is badly scaled and therefore the solution gets unreliable.
Ben-Tal, A., and A. Nemirovski. 2001. Lectures on Modern Convex Optimization. Society for Industrial; Applied Mathematics. https://doi.org/10.1137/1.9780898718829.
Boyd, Stephen, and Lieven Vandenberghe. 2004. Convex Optimization. New York, NY, USA: Cambridge University Press.
Diamond, Steven, and Stephen Boyd. 2015. “Convex Optimization with Abstract Linear Operators.” In Proceedings of the Ieee International Conference on Computer Vision, 675–83. IEEE Computer Society. https://doi.org/10.1109/iccv.2015.84.
———. 2016. “CVXPY: A Python-Embedded Modeling Language for Convex Optimization.” Journal of Machine Learning Research 17 (83): 1–5.
Fu, Anqi, Balasubramanian Narasimhan, and Stephen Boyd. 2020. “CVXR: An R Package for Disciplined Convex Optimization.” Journal of Statistical Software 94 (14): 1–34. https://doi.org/10.18637/jss.v094.i14.
Grant, Michael. 2004. “Disciplined Convex Programming.” PhD thesis, Stanford University. https://web.stanford.edu/~boyd/papers/pdf/mcg_thesis.pdf.
Grant, Michael, and Stephen Boyd. 2014. “CVX: Matlab Software for Disciplined Convex Programming, Version 2.1.” https://cvxr.com/cvx/.
Kaufmann, H. 1988. “On Existence and Uniqueness of Maximum Likelihood Estimates in Quantal and Ordinal Response Models.” Metrika 35 (1): 291–313. https://doi.org/10.1007/bf02613318.
Kosmidis, Ioannis, and Dirk Schumacher. 2020. “Detect/Check for Separation and Infinite Maximum Likelihood Estimates in Logistic Regression.” https://cran.r-project.org/package=detectseparation/vignettes/separation.html.
Page, E. 1977. “Approximations to the Cumulative Normal Function and Its Inverse for Use on a Pocket Calculator.” Journal of the Royal Statistical Society C 26 (1): 75–76. https://doi.org/10.2307/2346872.
Schwendinger, Florian, Bettina Grün, and Kurt Hornik. 2021. “A Comparison of Optimization Solvers for Log Binomial Regression Including Conic Programming.” Computational Statistics 36 (3): 1721–54. https://doi.org/10.1007/s00180-021-01084-5.
Silvapulle, Mervyn J. 1981. “On the Existence of Maximum Likelihood Estimators for the Binomial Response Models.” Journal of the Royal Statistical Society. Series B (Methodological) 43 (3): 310–13.
Theußl, Stefan, Florian Schwendinger, and Kurt Hornik. 2020. “ROI: An Extensible R Optimization Infrastructure.” Journal of Statistical Software 94. https://doi.org/10.18637/jss.v094.i15.