My research team at Virginia Tech recently started to develop a new multi-GPU accelerated incompressible flow solver using high-order discontinuous Galerkin discretizations as part of the LLNL/ANL Center for Efficient Exascale Discretizations funded by the DOE Exascale Computing Project. I will discuss how GPU memory and processor architecture has influenced the design of the flow solver. In particular I will discuss the strategies we are evaluating to efficiently time-step the incompressible Navier-Stokes equations with particular emphasis on temporal splitting, optimizing the advection steps, and preconditioning the elliptic solves. I will further describe how we are using empirically determined models to guide the optimization of GPU compute kernels.