CV |
Bio |
Google Scholar
|
I'm a Ph.D. grad from the University of Minnesota, Twin Cities in Computer Science, advised by Prof. Shashi Shekhar. My thesis committee included Prof. Vipin Kumar, Prof. Ravi Janardan, and Prof. Ying Song. My research lies at the intersection of spatial data science, spatiotemporal optimization, and machine learning. I develop physics- and knowledge-guided models for spatial reasoning over massive trajectory and remote-sensing data, focusing on diffusion models, robust anomaly detection, and decision-focused AI for maritime safety, climate risk, and biodiversity conservation. Recent projects include Pi-DPM, a physics-informed diffusion model for detecting adversarial GPS spoofing in vessel trajectories; Kriging-informed diffusion models and spatial neural surrogates for downscaling coarse climate fields; and trajectory data-mining methods for large-scale geospatial anomaly detection under missing data and distribution shift. Previously, I was a research intern at Esri, where I developed a scalable Graph-based Traffic Representation and Association (GTRA) framework for maritime route optimization and designed an anomaly detection pipeline integrating Transformer models and Evidential Deep Learning for real-time, large-scale GIS applications. Prior to my Ph.D., I earned my M.S. in Computer Science from the University at Buffalo in 2018 under the guidance of Prof. Varun Chandola. My work has been recognized with the University of Minnesota Doctoral Dissertation Fellowship and appears in leading AI and spatial computing venues. I also serve as a reviewer for major AI and data science conferences such as NeurIPS, ICML, ICLR, CVPR, AAAI, and SIGSPATIAL. 🎓 On the Academic Job Market! |
Research Statement / Teaching Statement / Diversity Statement / Ph.D. Thesis
|
|
|
|
CSCI 8715: Spatial Data Science
Research (Spring 2024) CSCI 5715: Spatial Data Science (Fall 2019) CSCI 5708: Database Systems (Spring 2019) CSCI 4041: Data Structures and Algorithms (Fall 2018) |
|
last update: Dec 2024 |
|
paper |
abstract |
bibtex
Given a dataset of moving object trajectories, a domain-specific study area, and a user-defined error threshold, we aim to identify anomalous trajectories indicative of possible GPS spoofing (e.g., broadcasting fake signals). The problem is societally important to curb illegal activities such as unauthorized fishing and illicit oil transfers in international waters. The problem is challenging due to advances in AI-generated deep fakes (e.g., additive noise, fake trajectories) and the scarcity of labeled samples for ground-truth verification. Current state-of-the-art methods ignore fine-scale spatiotemporal dependencies and prior physical knowledge, resulting in lower accuracy. In this paper, we propose a physics-informed anomaly detection framework based on an encoder-decoder architecture that incorporates kinematic constraints to identify trajectories that violate physical laws. Experimental results on maritime and urban domains demonstrate that the proposed approach yields higher solution quality and lower estimation error for anomaly detection and trajectory reconstruction tasks, respectively. |
|
paper |
abstract |
bibtex
Given a set of historical vehicle trajectories and their descriptive attributes, the goal is to train a generative model that produces synthetic trajectories with high physical fidelity. Here, physical fidelity is defined as fidelity to both geometric and dynamic properties of trajectories. The problem is important since trajectory generation can contribute to data augmentation for many traffic-related applications, such as popular route discovery and traffic light control. The key challenge lies in achieving high physical fidelity under coarse geospatial attributes (e.g., origin-destination pairs) that lack fine-grained details. Current methods, which mostly focus on geometric properties, have limited utility in domain-specific scenarios due to their neglect of trajectory dynamics. To address these limitations, we propose GCDM, a novel Geo-Lucid Conditional Diffusion Model framework that integrates road map attributes into the generative process through spatially hierarchical generation and map-informed latent variables. Experiments on real-world vehicle trajectory datasets show that GCDM outperforms state-of-the-art methods in geo-distribution similarity and dynamics fidelity. |
|
paper |
abstract |
bibtex
This work introduces physics-guided generative foundation models (PgGenFMs), a class of generative models that systematically integrate broad and narrow physical knowledge into data, training, and architecture design. The paper motivates PgGenFMs by outlining key limitations of purely data-driven foundation models, including poor out-of-distribution behavior, violations of physical laws, and lack of interpretability in scientific and engineering domains. It proposes a conceptual framework and taxonomy that contrast PgGenFMs with conventional foundation models and physics-guided task-specific models, and discusses how physical constraints can be embedded via loss terms, architectures, surrogate simulations, and hybrid designs. The paper also highlights open problems around where and how to inject domain knowledge, how to handle location dependence and bias in geospatial settings, and how to scale PgGenFMs while preserving physical consistency and transparency. |
|
paper |
abstract |
bibtex
The goal is to develop an efficient and accurate surrogate model for Daycent, a widely used but computationally expensive ecosystem model. This problem is important due to its societal applications in sustainable agriculture. Challenges include balancing the trade-off between prediction time and solution quality (e.g., accuracy), as well as the need to capture spatial relationships both within and across sites, while also accounting for varied crop management practices that introduce irregular and non-stationary patterns, reducing predictability. Related work on surrogate models with traditional feed-forward artificial neural networks (SM-ANN) has shown that these models have limited accuracy and often fail to capture spatial dependencies. To address these limitations, we explore novel Surrogate Models with Hybrid Spatial Neural Networks (SM-Hybrid) capable of explicitly modeling spatial autocorrelation and tele-connections. Experimental results show that the proposed SM-Hybrid is more accurate than SM-ANN and is twice as fast as the Daycent model. |
|
paper |
abstract |
bibtex
Given a collection of Boolean spatial features, the Super-Colocation Pattern Discovery process identifies subsets of features that are not only frequently located together but also have dense interactions. For example, the presence of multiple immune cells around cancer cells is more interesting to oncologists than simple colocation between immune and cancer cells. This problem is important due to its multiple societal applications, including oncology, economic analysis, and sports analytics. The problem is challenging due to the need to model interaction density among a subset of Boolean spatial features. Related work on colocation pattern mining is limited due to a lack of conceptual, logical, and physical models that accurately represent interaction density. Traditional interest measures (e.g., Participation Index) largely focus on the mere presence of another spatial feature type and overlook the number or density of neighboring instances. To address these limitations, we propose a novel interest measure, termed Super-Colocation Density, which utilizes a matrix or tensor along with a utility-based index to quantify the interaction density among subsets of spatial features. We also introduce novel Super-Colocation Mining algorithms and evaluate the proposed methods through both theoretical analysis and experiments with real and synthetic data. |
|
paper |
abstract |
bibtex
Given an origin, a destination, and a directed graph in which each edge is associated with a pair of non-negative costs, the bi-objective routing problem aims to find the set of all Pareto-optimal paths. This problem is societally important due to several applications, such as route finding that considers both vehicle travel time and energy consumption. The problem is challenging due to the potentially large number of candidate Pareto-optimal paths to be enumerated during the search, making existing compute-on-demand methods inefficient due to their high time complexity. One way forward is the introduction of precomputation algorithms. However, the large size of the Pareto-optimal set makes it infeasible to precompute and store all-pair solutions. In addition, generalizing traditional single-objective hierarchical algorithms to bi-objective cases is non-trivial because of the non-comparability of candidate paths and the need to accommodate multiple Pareto-optimal paths for each node pair. To overcome these limitations, we propose Multi-Level Bi-Objective Routing (MBOR) algorithms using three novel ideas: boundary multigraph representation, Pareto frontier encoding, and two-dimensional cost-interval-based pruning. Computational experiments using real road network data demonstrate that the proposed methods significantly outperform baseline methods in terms of online runtime and precomputation time. |
|
paper |
abstract |
bibtex
Given coarser-resolution projections from global climate models or satellite data, the downscaling problem aims to estimate finer-resolution regional climate data, capturing fine-scale spatial patterns and variability. Downscaling is any method to derive high-resolution data from low-resolution variables, often to provide more detailed and local predictions and analyses. This problem is societally crucial for effective adaptation, mitigation, and resilience against significant risks from climate change. The challenge arises from spatial heterogeneity and the need to recover finer-scale features while ensuring model generalization. Most downscaling methods fail to capture the spatial dependencies at finer scales and underperform on real-world climate datasets, such as sea-level rise. We propose a novel Kriging-informed Conditional Diffusion Probabilistic Model (Ki-CDPM) to capture spatial variability while preserving fine-scale features. Experimental results on climate data show that our proposed method is more accurate than state-of-the-art downscaling techniques. |
|
paper |
abstract |
bibtex
Given trajectories with gaps (i.e., missing data), we investigate algorithms to identify abnormal gaps in trajectories which occur when a given moving object did not report its location, but other moving objects in the same geographic region periodically did. The problem is important due to its societal applications, such as improving maritime safety and regulatory enforcement for global security concerns such as illegal fishing, illegal oil transfers, and trans-shipments. The problem is challenging due to the difficulty of bounding the possible locations of the moving object during a trajectory gap, and the very high computational cost of detecting gaps in such a large volume of location data. The current literature on anomalous trajectory detection assumes linear interpolation within gaps, which may not be able to detect abnormal gaps since objects within a given region may have traveled away from their shortest path. In preliminary work, we introduced an abnormal gap measure that uses a classical space-time prism model to bound an object's possible movement during the trajectory gap and provided a scalable memoized gap detection algorithm (Memo-AGD). In this paper, we propose a Space Time-Aware Gap Detection (STAGD) approach to leverage space-time indexing and merging of trajectory gaps. We also incorporate a Dynamic Region Merge-based (DRM) approach to efficiently compute gap abnormality scores. We provide theoretical proofs that both algorithms are correct and complete and also provide analysis of asymptotic time complexity. Experimental results on synthetic and real-world maritime trajectory data show that the proposed approach substantially improves computation time over the baseline technique. |
|
paper |
abstract |
bibtex
Given multi-category point sets from different place-types, our goal is to develop a spatially-lucid classifier that can distinguish between two classes based on the arrangements of their points. This problem is important for many applications, such as oncology, for analyzing immune-tumor relationships and designing new immunotherapies. It is challenging due to spatial variability and interpretability needs. Previously proposed techniques require dense training data or have limited ability to handle significant spatial variability within a single place-type. Most importantly, these deep neural network (DNN) approaches are not designed to work in non-Euclidean space, particularly point sets. Existing non-Euclidean DNN methods are limited to one-size-fitsall approaches. We explore a spatial ensemble framework that explicitly uses different training strategies, including weighted-distance learning rate and spatial domain adaptation, on various place-types for spatially-lucid classification. Experimental results on real-world datasets (e.g., MxIF oncology data) show that the proposed framework provides higher prediction accuracy than baseline methods.. |
|
paper |
abstract |
bibtex
We consider the problem of reducing the time needed by healthcare professionals to understand patient medical history via the next generation of biomedical decision support. This problem is societally important because it has the potential to improve healthcare quality and patient outcomes. However, navigating electronic health records is challenging due to the high patient-doctor ratios, potentially long medical histories, the urgency of treatment for some medical conditions, and patient variability. The current electronic health record systems provides only a longitudinal view of patient medical history, which is time-consuming to browse, and doctors often need to engage nurses, residents, and others for initial analysis. To overcome this limitation, we envision an alternative spatial representation of patients' histories (e.g., electronic health records (EHRs)) and other biomedical data in the form of Atlas-EHR. Just like Google Maps allows a global, national, regional, and local view, the Atlas-EHR may start with an overview of the patient's anatomy and history before drilling down to spatially anatomical sub-systems, their individual components, or sub-components. Atlas-EHR presents a compelling opportunity for spatial computing since healthcare is almost a fifth of the US economy. However, the traditional spatial computing designed for geographic use cases (e.g., navigation, land-surveys, mapping) faces many hurdles in the biomedical domain. This paper presents a number of open research questions under this theme in five broad areas of spatial computing. |
|
paper
|
abstract |
bibtex
Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across the food chain), spatial pathology (e.g., immunotherapy for cancer), retail, etc. The problem is computationally challenging due to the exponential number of candidate co-location patterns generated by the taxonomy. Most approaches for co-location pattern detection overlook the hierarchical relationships among spatial features, and the statistical significance of the detected patterns is not always considered, leading to potential false discoveries. This paper introduces two methods for incorporating taxonomies and assessing the statistical significance of co-location patterns. The baseline approach iteratively checks the significance of co-locations between leaf nodes or their ancestors in the taxonomy. Using the Benjamini-Hochberg procedure, an advanced approach is proposed to control the false discovery rate. This approach effectively reduces the risk of false discoveries while maintaining the power to detect true co-location patterns. Experimental evaluation and case study results show the effectiveness of the approach.. |
|
paper
|
abstract |
bibtex
Given a set S of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs < region (r_{g}), a subset C of S> such that C is a statistically significant regional-colocation pattern in r_{g}. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner [Subhankar et. al, 2022] that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost. |
|
paper |
abstract |
bibtex
Given trajectories with gaps, we investigate methods to tighten spatial bounds on areas (e.g., nodes in a spatial network) where possible rendezvous activity could have occurred. The problem is important for reducing manual effort to post-process possible rendezvous areas using satellite imagery and has many societal applications to improve public safety, security, and health. The problem of rendezvous detection is challenging due to the difficulty of interpreting missing data within a trajectory gap and the very high cost of detecting gaps in such a large volume of location data. Most recent literature presents formal models, namely space-time prism, to track an object's rendezvous patterns within trajectory gaps on a spatial network. However, the bounds derived from the space-time prism are rather loose, resulting in unnecessarily extensive postprocessing manual effort. To address these limitations, we propose a Time Slicing-based Gap-Aware Rendezvous Detection (TGARD) algorithm to tighten the spatial bounds in spatial networks. We propose a Dual Convergence TGARD (DC-TGARD) algorithm to improve computational efficiency using a bi-directional pruning approach. Theoretical results show the proposed spatial bounds on the area of possible rendezvous are tighter than that from related work (space-time prism). Experimental results on synthetic and real-world spatial networks (e.g., road networks) show that the proposed DC-TGARD is more scalable than the TGARD algorithm. |
|
paper |
abstract |
bibtex
Given trajectory data with gaps, we investigate methods to identify possible rendezvous regions. The problem has societal applications such as improving maritime safety and regulatory enforcement. The challenges come from two aspects. First, gaps in trajectory data make it difficult to identify regions where moving objects may have rendezvoused for nefarious reasons. Hence, traditional linear or shortest path interpolation methods may not be able to detect such activities, since objects in a rendezvous may have traveled away from their usual routes to meet. Second, user detecting a rendezvous regions involve a large number of gaps and associated trajectories, making the task computationally very expensive. In preliminary work, we proposed a more effective way of handling gaps and provided examples to illustrate potential rendezvous regions. In this article, we are providing detailed experiments with both synthetic and real-world data. Experiments on synthetic data show that the accuracy improved by 50 percent, which is substantial as compared to the baseline approach. In this article, we propose a refined algorithm Temporal Selection Search for finding a potential rendezvous region and finding an optimal temporal range to improve computational efficiency. We also incorporate two novel spatial filters: (i) a Static Ellipse Intersection Filter and (ii) a Dynamic Circle Intersection Spatial Filter. Both the baseline and proposed approaches account for every possible rendezvous pattern. We provide a theoretical evaluation of the algorithms correctness and completeness along with a time complexity analysis. Experimental results on synthetic and real-world maritime trajectory data show that the proposed approach substantially improves the area pruning effectiveness and computation time over the baseline technique. We also performed experiments based on accuracy and precision on synthetic dataset on both proposed and baseline techniques. |
|
paper |
abstract |
bibtex
Given a collection of Boolean spatial feature-types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy on the feature-types, taxonomy-aware colocation pattern discovery finds the subsets of feature-types or their parents frequently located together. Taxonomy-aware colocations are important due to their use in taxonomy-reliant societal applications in ecology (e.g., finding new symbiotic relationships across food-chain), spatial pathology (e.g., immunotherapy for cancer), etc. Due to the taxonomy, the number of candidate patterns increases considerably (i.e., exponential in the number of colocated instances, where a subset of instances have a parent-child relation). Existing algorithms for mining general colocations are not designed to use taxonomy and will incur redundant computations across the hierarchy. We propose a taxonomy-aware colocation miner (TCM) algorithm which uses a user-defined taxonomy to find taxonomy-aware colocation patterns. We also propose TCM-Prune algorithm that prunes duplicate colocations instances having a parent-child relation. Experiments with synthetic and real data sets show that TCM and TCM-Prune can find colocation patterns missed by the traditional approach (i.e., the ones which do not take hierarchy into account), and TCM-Prune can remove duplicate colocation instances. |
|
paper
|
abstract |
bibtex
Given trajectories with gaps (ie, missing data), we investigate algorithms to identify abnormal gaps for testing possible hypotheses of anomalous regions. Here, an abnormal gap within a trajectory is defined as an area where a given moving object did not report its location, but other moving objects did periodically. The problem is important due to its societal applications, such as improving maritime safety and regulatory enforcement for global security concerns such as illegal fishing, illegal oil transfer, and trans-shipments. The problem is challenging due to the difficulty of interpreting missing data within a trajectory gap, and the high computational cost of detecting gaps in such a large volume of location data proves computationally very expensive. The current literature assumes linear interpolation within gaps, which may not be able to detect abnormal gaps since objects within a given region may have traveled away from their shortest path. To overcome this limitation, we propose an abnormal gap detection (AGD) algorithm that leverages the concepts of a space-time prism model where we assume space-time interpolation. We then propose a refined memoized abnormal gap detection (Memo-AGD) algorithm that reduces comparison operations. We validated both algorithms using synthetic and real-world data. The results show that abnormal gaps detected by our algorithms give better estimates of abnormality than linear interpolation and can be used for further investigation from the human analysts. |
|
paper |
abstract |
bibtex
Given a set S of spatial feature-types, its feature-instances, a study area, and a neighbor relationship, the goal is to find pairs < region (rg), a subset C of S> such that C is a statistically significant regional colocation pattern in region rg. For example Caribou Coffee and Starbucks are significantly co-located in Minneapolis but not in Dallas at present. This problem has applications in a wide variety of domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. The current literature on regional colocation pattern detection has not addressed statistical significance which can result in spurious (chance) pattern instances. In this paper, we propose a novel technique for mining statistically significant regional colocation patterns. Our approach determines regions based on geographically defined boundaries (e.g., counties) unlike previous works which employed clustering, or regular polygons to enumerate candidate regions. To reduce spurious patterns, we perform a statistical significance test by modeling the observed data points with multiple Monte Carlo simulations within the corresponding regions. Using Safegraph POI dataset, this paper provides a case study on retail establishments in Minnesota for validation of proposed ideas. The paper also provides a detailed interpretation of discovered patterns using game theory and regional economics.. |
|
paper |
abstract |
bibtex
Spatiotemporal data mining aims to discover interesting, useful but non-trivial patterns in big spatial and spatiotemporal data. They are used in various application domains such as public safety, ecology, epidemiology, earth science etc. This problem is challenging because of the high societal cost of spurious patterns and exorbitant computational cost. Recent surveys of spatiotemporal data mining need update due to rapid growth. In addition, they did not adequately survey parallel techniques for spatiotemporal data mining. This paper provides a more up-to-date survey of spatiotemporal data mining methods. Furthermore, it has a detailed survey of parallel formulations of spatiotemporal data mining. |
|
paper |
abstract |
bibtex
Given aggregated mobile device data, the goal is to understand the impact of COVID-19 policy interventions on mobility. This problem is vital due to important societal use cases, such as safely reopening the economy. Challenges include understanding and interpreting questions of interest to policymakers, cross-jurisdictional variability in choice and time of interventions, the large data volume, and unknown sampling bias. The related work has explored the COVID-19 impact on travel distance, time spent at home, and the number of visitors at different points of interest. However, many policymakers are interested in long-duration visits to high-risk business categories and understanding the spatial selection bias to interpret summary reports. We provide an Entity Relationship diagram, system architecture, and implementation to support queries on long-duration visits in addition to fine resolution device count maps to understand spatial bias. We closely collaborated with policymakers to derive the system requirements and evaluate the system components, the summary reports, and visualizations. |
|
paper
|
abstract |
bibtex
Given trajectory data with gaps, we investigate methods to identify possible rendezvous regions. Societal applications include improving maritime safety and regulations. The challenges come from two aspects. If trajectory data are not available around the rendezvous then either linear or shortest-path interpolation may fail to detect the possible rendezvous. Furthermore, the problem is computationally expensive due to the large number of gaps and associated trajectories. In this paper, we first use the plane sweep algorithm as a baseline. Then we propose a new filtering framework using the concept of a space-time grid. Experimental results and case study on real-world maritime trajectory data show that the proposed approach substantially improves the Area Pruning Efficiency over the baseline technique. |
|
paper |
abstract |
bibtex
While climate models have evolved over time to produce high fidelity and high resolution climate forecasts, visualization and analysis of the output of the model simulations has been limited, typically constrained to single dimensional charts for visualization and basic aggregate statistics for analytics. Same is true for the large troves of observational data available from meteorological stations all over the world. For richer understanding of climate and the impact of climate change, one needs computational tools that allow researchers, policymakers, and general public, to interact with the climate data. In this paper, we describe, webGlobe, a browser based GIS framework for interacting with climate data, and other datasets available in similar format. webGlobe is a unique resource that allows unprecedented access to climate data through a browser. The framework also allows for deploying machine learning based analytical applications on the climate data without putting computational burden on the client. Instead, webGlobe uses a client-server framework, where the server, deployed on a cloud infrastructure, allows for dynamic allocation of resources for running compute-intensive applications. The capabilities of the framework will be discussed in context of a use case: identifying extreme events from real and simulated climate data using a Gaussian process based change detection algorithm. |
Modified version of template from here |