The properties of a learning-based system are particularly relevant to the process study of the unknown behavior of a system or environment. In the semiconductor industry, there is regularly a partially observable system in which the entire state of the process is not directly or fully visible due to uncertainties or disturbances. The model for studying such a system that permits uncertainties regarding the stochastic (Markov) process for state information acquisition is called a partially observable Markov decision process (POMDP). This study deals with the optimization issue of compensation control bias of a dynamic multilayer lithography process in wafer fabrication with prior information, high-dimensionality, and unmeasurable uncertainties. We show how the POMDP on a linear state-space model with uncertainties can encode the information from past runs and layers and deal with accumulated overlay error at the current run and layer. The Gibbs sampling is applied to optimize the belief function of the POMDP optimization approach. Note to Practitioners —The multilayer overlay error of the photolithography process is one of the remarkable and challenging issues in wafer fabrication. In a multilevel manufacturing process, errors occur at each level, which would be accumulated in the upstream operations. The optimization objective will be even more critical in a high-mixed fabrication process. In this study, the learning-based control system emerged with the state-space model compensating for the multilayer overlay error. The Gibbs sampling as a Bayesian approach as a core structure of optimization algorithm is utilized, which can be updated with information from engineering’s domain knowledge or estimated information about previous runs. The robustness of the proposed optimization algorithm is shown by comparing the distribution of overlay error with conventional methods and with a fast convergence rate of the learning algorithm.