"It's an exciting time to be alive." -- a biological mortal being


I'm a researcher interested in solving the problem of Artificial General Intelligence (AGI). I'm of the opinion that this might be the last problem humanity will ever solve before we find/lose our purpose. In my research, I use General Reinforcement Learning (GRL) as a de facto framework to capture (and argue about) the history-based decision-making process of an AGI without making the usual Markovian, ergodic or computability assumptions.

I consider myself lucky to be supervised by Prof. Marcus Hutter during my PhD at the Australian National University (ANU). The main focus of my PhD dissertation was to identify some useful abstractions of GRL to reduce representational complexity which aids both learning and planning in a wide range of environments.

Before I came to "down under", I've had done MPhil from Quaid-i-Azam University (QAU) under the wise supervision of Prof. Hasan Mahmood. In my MPhil thesis, I applied Game Theoretic methods to study the induced cooperative behavior among autonomous communication agents.

Research Profile

The following is a list of selected published research papers. The complete list can be found on my Google Scholar profile.

Exact Reduction of Huge Action Spaces in General Reinforcement Learning (AAAI 2021)

In this work, we propose a general-purpose method to reduce huge action-spaces in GRL to drastically improve the upper bound on the size of the state-space of the surrogate-MDP of Extreme State Aggregation.

Performance Guarantees for Homomorphisms Beyond Markov Decision Processes (AAAI 2019)

We extend the Extreme State Aggregation framework to state-action abstractions (a.k.a. homomorphisms). We show that even some non-Markovian homomorphism are able to represent the optimal policy of the environment through a surrogate-MDP.

Conditions on Features for Temporal Difference-Like Methods to Converge (IJCAI 2019)

A joint work with Samuel Yang-Zhao, and led by Marcus Hutter. In this work, we prove sufficient conditions on the choice of features which allow the TD-like algorithms to converge to a correct solution.

On Q-learning Convergence for Non-Markov Decision Processes (IJCAI 2018)

By exploiting the structure of non-Markovian abstractions, we prove that Q-learning can converge in a much broader class of problems than traditionally believed. Our results explain why Q-learning sometimes work, in practice, on non-Markovian domains.

Active Projects

At the moment, I'm working (non-exclusively) on the following major projects. Each project may encompass many stepping-stone problems/papers. If you want to discuss or collaborate on one of these (or any other) project, please feel free to reach out!

Arena: A Multi-agent, Multi-paradigm, History-based Decision-making Framework

I conceived Arena after carefully evaluating many available alternates, e.g. PettingZoo and Gym, to simulate GRL agents. Understandably, these frameworks are quite tightly connected to Markovian state-spaces. There is no "out-of-the-box" support for history-based environments. I wanted something which could easily manage multiple information sets at each time-step, e.g. different agents can have different perceptions from the same environment or only a subset of agents may act simultaneously. I also needed a framework where agents and environments are of the same type, i.e. an environment is also an "agent" which is simply not controlling any other agent. And, the list goes on and on ...

This is an ongoing project in its infancy at the moment. It needs more eyes and brains to mature. I am developing it in Python, but I plan to delegate the heavy lifting to C++ down the line.

Non-Markovian Feature Reinforcement Learning

The (sub-)problem of abstraction learning is known by different names among different AI sub-communities, e.g. representation learning, model selection, feature learning, neural network architecture search (AutoML), and hyper-parameter tuning are all forms of abstraction learning methods. That is why, I consider this to be the fundamental problem in the field of AI.

I am developing algorithms which can learn a compact model of any history-based environment through RL. The project is a non-Markovian extension of Feature Reinforcement Learning (FRL), as Markovian assumption is mostly hard to satisfy in reality. Like in FRL, an abstraction is a mapping from finite observational histories to the internal states of the agent, which is a much general setup than the popular SOTA architectures, e.g. attention networks and transformers.

Moreover, I do not demand the state-space to be Markovian. Therefore, the learned non-Markovian abstraction may not be able to predict the next state, however, it should be sufficient for the agent to behave optimally.

Hierarchical Abstraction Reinforcement Learning

I coined the term Abstraction Reinforcement Learning (ARL) in my PhD thesis as a framework for GRL with abstraction maps. In this project, I aim to extend this setup to a hierarchical setting, where the agents (or the policies per say) use different state abstractions at different levels of the hierarchy. For example, as in Feudal RL, the agent at the top can first "identify" the task (e.g. driving a car, making a cup of coffee, or playing chess) and then "deploy" a specialized agent down in the hierarchy to solve that task.

Although this hierarchical ARL (HARL) setup only uses state abstractions, the hierarchy also "mimics" an action abstraction. Thus, it unifies the state and action-abstractions neatly.


I co-lectured (with Elliot Catt) a graduate-level course at ANU about AIXI, a theoretically proven but incomputable "gold standard" AGI.

COMP4620: Advanced Topics in Artificial Intelligence

Take a look at my CV for a detailed snapshot of my professional profile.