Yangkang Chen is seismologist, research scientist and professor at The University of Texas at Austin.Chen studies AI, signal processing, imaging, and inversion.
The success of supervised learning requires a large-scale, high-quality labeled dataset for continuous benchmark algorithm development. TXED is our newly published dataset (), an important step towards benchmarked and fundamental AI research in seismology.
Traveltime calculation and ray tracing are fundamental to seismology, an earthquake science. Want to learn their basics and have some hands-on experience? Check out our newly developed open-source and easy-to-use Python package .
For training an earthquake signal and noise discriminator from scratch, I prepare an educational example for students on my GitHub page. There are many other tutorial notebooks on popular ML/AI applications in geoscience on the way. Stay tuned!
Now, the pyekfmm package (for 3D anisotropic traveltime calculation, ) can be installed on most mainstream platforms, Mac, Linux, Google Collab, and Windows (with MSVC as the build tool) by a simple ``pip install pyekfmm", with many reproducible examples.
I'm thrilled to share that our ground-shaking research on the thickness of the mantle-transition zone (MTZ) in the West Pacific region has been published today in JGR-Solid Earth Here is a glimpse of a comparison of different models, data, 410/660 topo.
In a series of continuous works, we developed a high-resolution time-frequency transform that can reveal unprecedentedly detailed features from an earthquake seismogram (). The source codes are at .
In our latest work, we explored the potential of deep learning in peak ground acceleration (PGA) and surprisingly showed that it is much more accurate than standard methods. This is significant for earthquake early warning. More details are at .
Deep learning 3D seismic denoising is no more than extracting the hidden 3D feature hidden in the data to better represent the input data and remove the unpredictable noise. The 3D features extracted from multi-dimensional seismic data below show clear coherent structures.
With our in-house AI-powered monitoring framework, we can detect a small-magnitude earthquake down to magnitude 0.384 still confidently. (a) Waveforms. From top to bottom, the epicentral distance increases. The time shown on the top of the waveforms is the start time waveforms.
From a large earthquake, we can calculate the receiver functions, and with sophisticated processing and imaging approaches, we can recover a high-resolution deep-earth structure image. We're surprised by the existence of a possible "double Moho" structure. Any ideas on why?
Deep learning can estimate earthquake magnitude accurately using P-wave only. () (a) The white triangles indicate seismic stations and the red circles indicate earthquakes. (b) The error distribution for the TexNet data. (c) Predicted V.S. Ground truth.
Everyone knows AI, but only some know how fascinating it can be in geoscience. TexNet might be the first (correct me if I'm wrong) earthquake monitoring network that is completely based on AI for production. The recall of Magnitude>2 events is 100%, meaning no mistake at all.
Pyekfmm is python package for 3D fast-marching-based traveltime calculation and its applications in seismology. A suite of fully reproducible DEMO scripts are prepared for ensuring beginner seismologists to reproduce every figure in our gallery.
Full-waveform moment tensor inversion helps decipher the earthquake source mechanism. However, most of the time, we simultaneously have good (small value) and bad fit (large value) , due to the problem's non-linearity and ill-posedness.
More details at .
Just came back from
@JpGU
and met a wonderful team of earthquake prediction (EQpre) scientists. First time know that EQpre is not only the duty of seismologists but also the interest of a diverse team of cross-interdisciplinary scientists, even geochemists, and space scientists.
Our latest AI-powered focal mechanism inversion framework has been in production, enabling us to transform tens of thousands of TexNet catalog events into the same number of focal mechanism solutions. Here are 500 solutions in West Texas with a magnitude down to below Ml=2.
As easy as Python
#Python
, as fast as C, Pyekfmm () is our latest open-source Python-and-C package for fast-marching travel time calculation in multi-scales (local, regional, and global). Try it out by "pip install pyefkmm".
Our machine learning approach (EQCCT,) helps unveil an unprecedentedly detailed characterization of seismicity in the Midland basin, TX, despite the relatively sparse station. The M 5.2 Range Hill event is located in zone 6 by far largest event in Midland.
Any ideas on an inverse correlation between seismicity and fluid injection rate? Due to the detection of an enormous number of small earthquakes in Texas, we find a distinct drop in the small earthquake activities as the injection rate peaks.
For those who are interested in researching distributed acoustic sensing (DAS) data processing, here are two benchmark datasets and processing frameworks, and , with which we can hopefully accelerate the research development on DAS.
Advanced seismic processing techniques help reveal unprecedentedly detailed crustal structures (). a (one EQ) and d (11 EQs) correspond to the commonly used approach; b and d correspond to the new workflow. Any hints about the double Moho? Let me know.
Distributed acoustic sensing (DAS) data is complicated. This figure shows the anatomy of the noise components in the DAS data. As a result, we need to develop individual approaches to deal with these complicated noise components before we can better leverage the fiber optic data.
Deciphering the black box! Simple applications of deep learning are no longer fascinating. Instead, many are exploring why it can do an amazing job We're among those deep explorers trying to uncover the black box. See a recent work from my collaborators
Deep learning full waveform inversion also suffers from the inaccuracy of deep-layer structures due to the much weaker reflection from the deeper depth, see Figures D and F. More details are at and .
The time-frequency spectra of earthquakes and quarry blasts are dramatically different, which helps AI approaches distinguish them (). Examples of (a) earthquake and (b) quarry blast. The corresponding scalogram for the (c) earthquake and (d) quarry blast.
While the whole world is focusing on major earthquakes near Taiwan and Japan or the rare and mysterious earthquake in New Jersey, recently, quakes in Texas are relentless. See, a M4.4 earthquake occurred just about one week ago near Midland, TX.
Revolutionary deep learning (DL) research requires a robust benchmark workflow and dataset. Here is an example. In a series of benchmarked research works on distributed acoustic sensing (DAS) data processing, we can finally achieve some magic performance like this.
Structure-oriented filtering plays a critical role in nowadays popular distributed acoustic sensing data processing. Here is a demonstration of the principle of structure-oriented median filter. (a) Definition of the local slope of seismic data. (b) Flattening principle.
Seismic station density matters! Even with our advanced deep-learning phase picker, EQCCT (), the station density/number still plays a critical role in detecting ultra-small earthquakes and lowering the magnitude completeness. This is a test in West Texas.
Distributed acoustic sensing can be used to monitor traffic volume in an urban environment. Here is an example. We see a significantly high volume (heavy traffic) in slow-traffic areas (like bridges and tunnel exits) and traffic jam periods (like 6 PM - 10 PM and 9 AM-11 AM).
Paper Alert! Stay tuned for our ground-shaking research on the thickness of the Mantle-Transition-Zone (MTZ) in the West Pacific region, coming out soon in JGR-SE. Here is a glimpse of a comparison of different models. We used an advanced SS precursor signal processing approach.
In our recent work, we deciphered the mystery of the incredibly effective performance of ground motion prediction inside the black-box deep-learning network. We find that the key feature of the seismograms can be gradually learned as the network deepens.
Regression problems are more challenging than classification problems for deep learning (DL) to solve. Here, we're applying DL for a complex regression task. This is a long-lasting (so far three years) project to tackle an urgent problem: peak ground acceleration (PGA) prediction
Here is an example of how a deepening encoder layer helps shape a multi-dimensional (2D/3D) seismic dataset into a more geologically reasonable pattern. In this case, the deeper, the better, but not always 😃
We have seen numerous applications of distributed acoustic sensing (DAS) in geophysical inversion and earthquake monitoring, but we seem to overlook an important application in urban traffic monitoring, which is different from inversion for near-surface structures.
Our SigRecover paper has just been published by Seismological Research Letter . The complete suite of reproducible codes and examples can be found at .
A combination of several AI-powered picking (EQCCT) () and PGA prediction (EQViT) () modules fosters a real-time ground motion prediction system that has been demonstrated to be effective in earthquake early warning (EEW) practice.
Seismic waveforms of the Saturday night M4.7 SouthWest Texas earthquake. Earthquake waveforms in this area are distinct (due to the large-scale Eagle-Ford's shale formation?) from earthquake waveforms in other parts of Texas.
We developed a method (SigRecover) for minimizing signal leakage by recovering useful signals from removed noise, which is caused by many reasons (e.g., imperfect parameter or inadequacy of assumption) in standard DAS data processing. The work is coming out soon; stay tuned!
The most complete Texas catalog so far is coming out (Stay tuned for its online version) thanks to our in-production AI phase picker (EQCCT). Here is a distribution of the events. In (a) CMEZ denotes Culberson and Mentone Seismic Zone, the most seismically active area in Texas.
Through a sophisticated workflow called structure-oriented interpolation we developed earlier (; ), we can recover a 3D subsurface model from a handful of sparsely distributed well logs like this.
Although we have learned that deep learning (e.g., a standard CNN architecture) with a good training dataset can be effective for an earthquake-quarry-blast classification problem, we seldom know that various deep learning architectures could make a huge difference.
Our AI-assisted catalog is finally out. We propose to apply an advanced deep learning method, the compact convolutional transformer (EQCCT), to unleash our power in analyzing hundreds of small earthquakes per day in West Texas. What a huge difference!
Time-frequency analysis (TFA) offers a perspective for evaluating the quality of seismic data, especially when the noise is so strong that signals are barely visible. We apply a series of denoising approaches from top to bottom.The denoising efficacy can be verified from left TFA
This figure shows the magic of transfer learning (TL) in seismic phase picking and earthquake detection. Left: our EQCCT model from a global database; Middle: refined EQCCT model using Texas data for transfer learning; Right: triggered station comparison. What a big difference!
Synthetic data tests are really helpful in earth science studies, where, in many cases, no ground truth is available. With synthetic data, we can mimic the real situation and have a solid way to evaluate the performance of any possible solution. See
The high-resolution catalog reveals a clear spatial b-value distribution of hydraulic-fracturing-induced microseismicity. Low b-values clearly delineate the reactivated faults (blue), and high b-values depict the stimulation-opened fractures (red).
With a 5D reconstruction technique arising from the reflection seismic community, we can reconstruct an extremely sparse dataset with >90% missing traces on a regular grid from top to middle. Further, with AI, we are able to reconstruct the bottom one. Amazing!
Pyseistr is a python package for structural denoising and interpolation of multi-channel seismic data. The latest version has included both Python and C (hundreds of times faster) implementations of the embedded functions. See Here is a DEMO for DAS.
Despite the generally better performance of deep-learning earthquake magnitude estimator, it shows common-sense stress test results. MAE versus (a) SNRs, (b) input window sizes. Uncertainty versus (c) magnitude. Lower SNR, shorter window, and larger magnitude mean larger errors.
Real-time machine-learning earthquake location shows substantially smaller errors in inland Japan, where the ray density is higher, than in coastal areas, where the ray density is obviously lower. Check more details at .
Distributed acoustic sensing (DAS) technology has witnessed numerous successes in a variety of scientific and engineering applications. We develop advanced processing methods to uncover extremely weak signals (see panel c) from DAS for earthquake analysis.
This is how we generate the faulted model for deep-learning inversion studies (). (a) The layered model. (b) The model is shifted according to the fault line. (c) Obtained fault model.
Seismic anisotropy is a wave propagation phenomenon that describes the dependence of wave speed (seismic velocity) on direction. This figure clearly shows the oval shape of the traveltime in a VTI (transverse isotropy with a vertical axis of symmetry) anisotropic medium.
Paper Alert!
Earthquake activities in areas across the Midland Basin and the Central Basin Platform of West Texas have significantly increased since mid-2019 because of continuing industrial activities involving wastewater injection. Check it out at
These are learned features through dictionary learning for three SOTA methods (a) jDAS example (b) IDF example (c) Wang example. The features highlighted by green circles are clearly representing the signal characteristics. . Code: .
We're often asked how to evaluate the earthquake location's success/reliability. Through a large and complete EQ catalog from AI, we expect to see a significantly more clustered catalog if we use a better location algorithm and velocity model, given the same P/S picks.
Even with a semi-uniform seismic acquisition system like USArray, we still miss a certain percentage of traces. With multi-dimensional reconstruction, we can hopefully recover those missing traces, as well as remove interfering noise (overlapping the key teleseismic phases).
S-wave velocity plays a crucial role in various applications but often remains unavailable in vintage wells. We propose an ML framework for estimating S-wave sonic logs from conventional logs, including P-wave sonic, gamma ray, total porosity, and bulk density.
Geological model building is critical to many solid earth research problems. In one of our earlier works, we develop quantitative methods to generate arbitrary geological models with folding, fault, and salt body structures. Check the paper out at .
More details here. Between 2022/11/28 and 2022/12/31, our EQCCT approach () detected 3,350 events, with a minimum magnitude below -1 Ml and a maximum above 3.5 Ml. Specifically, in Culberson County, there are 45 M2 events after a manual check.
This is how we generate the salt model for deep-learning inversion studies (). (a) The set of Gaussian curves, to simulate the intrusion influence of salt body. (b) Salt body. (c-e) Layered velocity model, influenced layers by (a), and obtained salt model.
Happened to find a video created many years ago explaining the seismic waves propagating through the subsurface, reflected at the elastic interface, and recorded by surface geophones. This is an informative short video to help understand seismic waves and seismic acquisition.
We hope that one day, the deep-earth image could be as informative as the reflection image below we generated using advanced imaging techniques. Our team will strive for that tirelessly! 💪
Looking forward to communicating with participants and fellow seismologists at the Japan Geoscience Union Meeting 2024, to be held between May 26-31, 2024, in Chiba, Japan.
Time flies! Since the first reported earthquake (texnet2017aleg, Ml=1.97) recorded on 2017-01-07, right after the establishment of the Texas Seismological Network (TexNet), it's been 7 years! More than 24322 events have been reported by TexNet ().
Recall the fact that we actually don't know where earthquakes are exactly happening. We do not have ground truth at all. This strategy is really informative. Left/Right: Location algorithm 1 with a global average velocity/algorithm 2 with a regional velocity model.
This observation is against the mainstream assumption regarding the relationship between fluid injection rate and seismicity. It is commonly thought that increased fluid injection causes increased seismicity; this, however, is contradicted by what we have observed in our study.
There are no M<1.9 events from EQCCT with a final magnitude above 2, meaning that the recall of M2 events is 100%. The mean absolute error of magnitude estimation is 0.04 and the standard deviation is 0.33. The red dashed line indicates the mean magnitude difference (0.04).
In the past five years, I was fortunate to supervise Innocent Oboue as my first Ph.D. student, train a geologist who knows nothing about programming into a computational geophysicist, and finally achieve this ground-breaking goal after another five years (2019-2024).
(b) and (e) are processed with the proposed workflow. Arrows in (e) indicate improved structural details in the denoised RF profile. The black dashed lines mark the potential subsurface extension of major geological boundaries in this region. (c) and (f) Fold maps.
The stations are colored according to their installation time. (a) Detected events assuming the up-to-date station distribution. Event magnitude distribution corresponding to (a). (c) Detected events assuming only 12 stations exist. (e) Detected events assuming only 8 stations.
@martijnende
This is an interesting point! It seems that the picker is consistently making certain errors (although small), possibly due to shifting or aliasing inside the network, around every 6, 12, and 18 samples. We'll look into it. Thanks, Martijn, for your suggestion!
This is a test in Midland Basin, Texas. The MAEs in latitude, longitude, and depth are 0.0305 deg, 0.0280 deg, and 1.93 km, respectively. The median absolute errors in latitude, longitude, and depth are 0.00336 deg, 0.00354 deg, and 0.458 km, respectively.
The reported events represent those events that are detected by the traditional automatic picker short-term-average/long-term-average at TexNet but are not manually reviewed. It stands for the SOTA of EQ detection and picking. In this case, the AI picker clearly outperforms SOTA.
The red and green lines mark the EQCCT-picked P- and S-wave arrivals. (b) Station and event locations on the map. (c) Station and event locations on the depth slice. Eight stations detected the event using EQCCT. This event was almost impossible to detect by analysts.
Distributed acoustic sensing (
#DAS
) in urban areas (e.g., Hangzhou in our latest research) can help us monitor traffic conditions. Panels 1–3 show examples of large passing cars that induce strong signals, and the last panel shows an example of a small car related to weak signals
Pyekfmm package () is one of the few packages that can calculate traveltime in anisotropic media. Check it out and play with a hands-on example at . DEMO figures are reproducible.
In addition, we performed waveform moment tensor inversion to determine earthquake source mechanisms; subsequently, we inverted for the regional stress field using the obtained source mechanisms.