This talk includes two parts: In the first part, I will present our recent work on developing data dependent implicit ac- tivations. Numerical results on the derived deep neural networks (DNNs) from VGG and ResNet show three remarkable advantages: 1) We are able to train very deep networks with tiny amount of training data; 2) On CIFAR10 and CIFAR100 benchmarks, we achieve relatively 20 to 30 percent accuracy improvement compared to the base DNNs; 3) To get the same level of accuracy, our model’s size is tens of times smaller than its peers. In the second part, I will present a two scaled model for spatial temporal data analysis. First, we use a stochastic process model to infer the Lagrangian representation of the space which reveals macroscale interactions between different regions. Second, to fit the time series on the tesselated space with good generalization ability, we develop a graph stacked recurrent neural network to approximate the historical data. We obtain state-of-art results on both crime and traffic forecasting problems. The first part is a joint work with profes- sors Stanley J. Osher and Zuoqiang Shi. The second part is a collaborative work with professors Andrea L. Bertozzi and P. Jeffrey Brantingham, and Mr. Xiyang Luo and Mr. Fangbo Zhang.