Correct fusion of data from two sensors requires an accurate estimate of their relative pose, which can be determined through the process of extrinsic calibration. When the sensors are capable of producing their own egomotion estimates (i.e., measurements of their trajectories through an environment), the `hand-eye' formulation of extrinsic calibration can be employed. In this paper, we extend our recent work on a convex optimization approach for hand-eye calibration to the case where one of the sensors cannot observe the scale of its translational motion (e.g., a monocular camera observing an unmapped environment). We prove that our technique is able to provide a certifiably globally optimal solution to both the known- and unknown-scale variants of hand-eye calibration, provided that the measurement noise is bounded. Herein, we focus on the theoretical aspects of the problem, show the tightness and stability of our convex relaxation, and demonstrate the optimality and speed of our algorithm through experiments with synthetic data.