@ShizaCharania
finding the optimal policy for my perceived reward function | prev. computer vision + ai alignment research