Application of Machine Learning and Deep Learning Methods in Geological Carbon Sequestration Across Multiple Spatial Scales

TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


Under current technical levels and industrial systems, geological carbon sequestration (GCS) is a viable solution to maintain and further reduce carbon dioxide (CO2) concentration and ensure energy security simultaneously. The pre-injection formation characterization and post-injection CO2 monitoring, verification, and accounting (MVA) are two critical and challenging tasks to guarantee the sequestration effect. The tasks can be accomplished using core analyses and well-logging technologies, which complement each other to produce the most accurate and sufficient subsurface information for pore-scale and reservoir-scale studies. In recent years, the unprecedented data sources, increasing computational capability, and the developments of machine learning (ML) and deep learning (DL) algorithms provide novel perspectives for expanding the knowledge from data, which can capture highly complex nonlinear relationships between multivariate inputs and outputs. This work applied ML and DL methods to GCS-related studies at pore and reservoir scales, including digital rock physics (DRP) and the well-logging data interpretation and analysis. DRP provides cost-saving and practical core analysis methods, combining high-resolution imaging techniques, such as the three-dimensional (3D) X-ray computed tomography (CT) scanning, with advanced numerical simulations. Image segmentation is a crucial step of the DRP framework, affecting the accuracy of the following analyses and simulations. We proposed a DL-based workflow for boundary and small target segmentation in digital rock images, which aims to overcome the main challenge in X-ray CT image segmentation, partial volume blurring (PVB). The training data and the model architecture are critical factors affecting the performance of supervised learning models. We employed the entropy-based-masking indicator kriging (IK-EBM) to generate high-quality training data. The performance of IK-EBM on segmentation affected by PVB was compared with some commonly used image segmentation methods on the synthetic data with known ground truth. We then trained and tested the UNet++ model with nested architecture and redesigned skip connections. The evaluation metrics include the pixel-wise (i.e. F1 score, boundary-scaled accuracy, and pixel-by-pixel comparison) and physics-based (porosity, permeability, and CO2 blob curvature distributions) accuracies. We also visualized the feature maps and tested the model generalizations. Contact angle (CA) distribution quantifies the rock surface wettability, which regulates the multiphase behaviors in the porous media. We developed a DL-based CA measurement workflow by integrating an unsupervised learning pipeline for image segmentation and an open-source CA measurement tool. The image segmentation pipeline includes the model training of a CNN-based unsupervised DL model, which is constrained by feature similarity and spatial continuity. In addition, the over-segmentation strategy was adopted for model training, and the post-processing was implemented to cluster the model output to the user-desired target. The performance of the proposed pipeline was evaluated using synthetic data with known ground truth regarding the pixel-wise and physics-based evaluation metrics. The resulting CA measurements with the segmentation results as input data were validated using manual CA measurements. The GCS projects in the Illinois Basin are the first large-scale injection into saline aquifers and employed the latest pulsed neutron tool, the pulsed neutron eXtreme (PNX), to monitor the injected CO2 saturation. The well-logging data provide valuable references for the formation evaluation and CO2 monitoring in GCS in saline aquifers at the reservoir scale. In addition, data-driven models based on supervised ML and DL algorithms provide a novel perspective for well-logging data analysis and interpretation. We applied two commonly used ML and DL algorithms, support vector machine regression (SVR) and artificial neural network (ANN), to the well-logging dataset from GCS projects in the Illinois Basin. The dataset includes the conventional well-logging data for mineralogy and porosity interpretation and PNX data for CO2 saturation estimation. The model performance was evaluated using the root mean square error (RMSE) and R2 score between model-predicted and true values. The results showed that all the ML and DL models achieved excellent accuracies and high efficiency. In addition, we ranked the feature importance of PNX data in the CO2 saturation estimation models using the permutation importance algorithm, and the formation sigma, pressure, and temperature are the three most significant factors in CO2 saturation estimation models. The major challenge for the CO2 storage field projects is the large-scale real-time data processing, including the pore-scale core and reservoir-scale well-logging data. Compared with the traditional data processing methods, ML and DL methods achieved accuracy and efficiency simultaneously. This work developed ML and DL-based workflows and models for X-ray CT image segmentation and well-logging data interpretations based on the available datasets. The performance of data-driven surrogate models has been validated regarding comprehensive evaluation metrics. The findings fill the knowledge gap regarding formation evaluation and fluid behavior simulation across multiple scales, ensuring sequestration security and effect. In addition, the developed ML and DL workflows and models provide efficient and reliable tools for massive GCS-related data processing, which can be widely used in future GCS projects.



Geological carbon sequestration, machine learning, deep learning, X-ray CT image processing, digital rock physics, well-logging analysis