13-15 June 2022 | 10:00-13:00 EDT // 16:00-19:00 CEST

Organizers: Natasha MacBean (1), Jana Kolassa (2), Andy Fox (3), Tristan Quaife (4), Hannah Liddy (5)
(1) Indiana University, (2) NASA GMAO, (3) Joint Center for Satellite Data Assimilation, (4) University of Reading, (5) Columbia University/NASA GISS
† Organized by the AIMES Land Data Assimilation Working Group

Workshop Overview: The goals of the workshop build on the principles of the AIMES Land DA Working Group to: 1) foster knowledge exchange across all groups working in land data assimilation and 2) build a community of practice and collaboration in land DA, particularly for addressing the technical challenges we face when implementing DA systems. We therefore welcome participation from a broad range of research interests including land surface states and fluxes (carbon, energy, and water cycles to crop, fire, and land management), timescales (daily, seasonal to subseasonal, centennial/millennial), and scientific and practical applications (improving understanding of carbon-climate feedbacks, weather prediction, agricultural forecasting, and climate change impacts). The outcome of this workshop is to increase collaboration and coordination within the land DA community to tackle technical challenges and promote the routine use of DA tools in the wider modeling community. This workshop also builds on the first land DA workshop, which is summarized in this meeting report: https://doi.org/10.1175/BAMS-D-21-0228.1.’

Deadline to Register: Wednesday, June 1st at 12:00 EDT 

Workshop Agenda

Monday, 13 June
Machine Learning in Land DA

9:50 AM EDT/
15:50 CEST
Coffee/tea time to join the conversation early and test out your camera and microphone. If you wish to use one, we are encouraging everyone to choose a different field work or earth observation photo each day (e.g. from the NASA or ESA image archives) as your background image on Zoom. 
10:00 AM EDT/ 16:00 CEST Welcome from the Co-Chairs: Introduction to the workshop context and goals 
10:10 AM EDT/ 16:10 CEST Speaker 1: Sujay Kumar (NASA GSFC) – Use of advanced machine learning for improved exploitation of remote sensing information
10:30 AM EDT/ 16:30 CEST Speaker 2: Xu Shan (TU Delft) – Assimilating ASCAT dynamic vegetation parameters to constrain the plant water dynamics in land surface model
10:50 AM EDT/ 16:50 CEST Speaker 3: Timothée Corchia (CNRM) – Contribution of machine learning for the integration of satellite observations in a global model of the soil-plant system
11:10 AM EDT/
17:10 CEST
11:20 AM EDT/ 17:20 CEST Speaker 4: Feng Tao (Tsinghua University) – PROcess-guided deep learning and DAta-driven modelling (PRODA) to uncover key patterns and mechanisms in global soil carbon dynamics
11:40 AM EDT/ 17:40 CEST Speaker 5: Philippe Peylin (CNRS-LSCE) – Comparative evaluation of different data assimilation approaches to optimize the parameters of the ORCHIDEE land surface model
12:00 PM EDT/ 18:00 CEST Speaker 6: Daiya Shiojiri (Chiba University) – Optimizing rain gauge locations based on data-driven sparse sensor placement
12:20 PM EDT/ 18:20 CEST  Break 
12:25 PM EDT/ 18:25 CEST

Poster Session and Career Corner

1. Impact of uncertainties in meteorological forcing data on global estimates of monthly mean soil moisture and runoff
Mao Ouyang1*, Daiya Shiojiri, Shunji Kotsuki
1Center for Environmental Remote Sensing, Chiba University, Chiba, Japan

2. Inferring the Climate Response of Leaf Area Index and its Impacts on Net Biome Exchange Across Bioclimatic Zones
Alexander J. Norton1*, A. Anthony Bloom1, Nicholas C. Parazoo1, Paul A. Levine1, Shuang Ma1, Renato K. Braghiere1
1Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA

3. Modeling of Microwave Multi-Frequency Backscatter and Emission by a Community Land Active Passive Microwave Radiative Transfer Modeling Platform (CLAP)
Hong Zhao1*, Yijian Zeng1, Bob Su1, Jan G. Hofste1, Ting Duan1
1Faculty of Geo-information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands

4. Using soil moisture and land surface temperature Earth observations to optimize land surface model performance
Nina Raoult1*, Catherine Ottlé1, Philippe Peylin1, Vladislav Bastrikov1, and the ORCHIDAS team

5. Applying the Data Assimilation Research Testbed towards improved simulations of Earth System Carbon, Water and Energy Cycling
Brett Raczka1*, Jeffrey L. Anderson1, Andrew M. Fox2,  Xueli Huo3, Daniel Hagan4, Moha Gharamti1, Kevin Raeder1, Helen Kershaw1, Ben Johnson1
1National Center for Atmospheric Research
2Joint Center for Satellite Data Assimilation
3University of Arizona
4Nanjing University of Information Science & Technology

6. Parameter Optimization to Improve Seasonal Predictions of Evapotranspiration Partitioning in Semiarid Ecosystems
Kashif Mahmud1*, Nina Raoult2, Russell Scott3, Natasha MacBean1
1*Department of Geography, Indiana University, Bloomington, IN 47405, USA
2Laboratoire des Sciences du Climat et de l’Environnement, LSCE/IPSL, CEA-CNRS-UVSQ
3Université Paris-Saclay, Gif-sur-Yvette, F-91191, France

7. The Arctic Carbon Monitoring and Prediction System, a data assimilation system to reduce uncertainty of the permafrost-carbon climate feedback
Elchin Jafarov1*, Helene Genet2, Brendan Rogers1, Jennifer Watts1, Valeria Brionis1, Greg Fiske1, Susan Natali1
1Woodwell Climate Research Center, MA
2University of Alaska Fairbanks, AK

8. Current status and future plans for KIM land surface data assimilation
Sanghee Jun1*, Kyung-Hee Seol1, In-Hyuk Kwon1
1Korea Institute of Atmospheric Prediction Systems (KIAPS)

9. Assimilate leaf area index and biomass to constrain carbon dynamics in the Arctic and Boreal region
Xueli Huo1*
1University of Arizona, AZ

10. Weather and Water: Land DA developments using JEDI at the JCSDA
Andy Fox*1, James McCreight2, Amir Mazrooei2, Soren Rasmussen2, Tom Enzminger2, Greg Fall3, Mike Barlage4, Jiarui Dong5, Youlong Xia5, Clara Draper6, Sergey Frolov6, Tseganeh Gichamo7, Zofia Stanley7

1:00 PM EDT/
19:00 CEST

Tuesday, 14 June
Novel Observations and Approaches

9:50 AM EDT/
15:50 CEST
Coffee/tea time to join the conversation early and test out your camera and microphone. If you wish to use one, we are encouraging everyone to choose a different field work or earth observation photo each day (e.g. from the NASA or ESA image archives) as your background image on Zoom. 
10:00 AM EDT/ 16:00 CEST Welcome from the Co-Chairs: Introduction to Day 2
10:05 AM EDT/ 16:05 CEST Speaker 1: Cédric Bacour (LSCE) – Assessing the complementarity of multiple datasets in constraining model estimates of net and gross global C budgets within a data assimilation framework
10:25 AM EDT/ 16:25 CEST Speaker 2: Yiqi Luo (Northern Arizona University) – Estimating spatially and temporally varying parameters of Earth system models with data assimilation and deep learning
10:45 AM EDT/ 16:45 CEST Break 
10:55 AM EDT/ 16:55 CEST Speaker 3: Paul A. Levine (Jet Propulsion Laboratory at Caltech) – Variable response of Amazon watersheds to climate and CO2 trends across environmental gradients
11:15 AM EDT/ 17:15 CEST Speaker 4: Shuang Ma (Jet Propulsion Lab at Caltech) – Resolving the carbon-climate feedback potential of high latitude wetland CO2 and CH4 exchanges
11:35 AM EDT/ 17:35 CEST

Lightning talks: Land DA Group Roundtable

  1. Bethy/CCDAS
  3. ECMWF
  4. Environment Canada
  6. JCSDA
  7. JULES 
  8. Met Office
  10. NASA LIS
  11. NCAR CLM
  12. NOAA
12:00 PM EDT/ 18:00 CEST Q&A 
12:15 PM EDT/ 18:15 CEST 

Breakout Group Discussions

1. Characterizing human management features with remote sensing and data assimilation
Leads: Sujay Kumar and Manuela Girotto
This breakout session will focus on exploring data assimilation and modeling efforts to characterize the impact and drivers of anthropogenic processes such as irrigation, groundwater pumping, reservoir management, disturbances such as fires.

2. Land forecasting and the NEON Forecasting Challenge
Lead: Michael Dietze
What can we do, in general, to promote near-term and S2S forecasts of land processes and how can the land DA community get involved with the Ecological Forecasting Initiative’s NEON forecasting challenge in particular?

3. Non-linear filters for data assimilation
Lead: Prashant Kumar
This breakout session will consider potential and need of non-linear filters for land data assimilation. The goal of this breakout group is to explore limitations of the present land data assimilation techniques and benefits and challenges of assimilating land observations using non-linear filters (like “Land data assimilation using particle filter”).

4. Co-developing DA education/course materials
Lead: Natasha MacBean
In this breakout group we will discuss opportunities for co-developing, as a Land DA Community, DA educational materials (or a possible short course) to help train early stage PhD students in DA methods, and/or to entice senior undergraduate and Masters students to pursue PhDs that would require knowledge and experience in DA.

5. Sensitivity / Uncertainty analysis of carbon dynamics in arctic terrestrial ecosystems
Lead: Hélène Genet
This breakout group will discuss methods, data availability and model comparison of sensitivity and uncertainty analysis focused on carbon dynamics in arctic and boreal ecosystems. The goal of this breakout group is to develop a sensitivity analysis that would be conducted by multiple modeling groups to evaluate how model sensitivity (and performance) is affected by model structure.

1:00 PM EDT/ 19:00 CEST  END

Wednesday, 15 June
Ensemble DA Methods 

9:50 AM EDT/ 15:50 CEST Coffee/tea time to join the conversation early and test out your camera and microphone. If you wish to use one, we are encouraging everyone to choose a different field work or earth observation photo each day (e.g. from the NASA or ESA image archives) as your background image on Zoom. 
10:00 AM EDT/ 16:00 CEST Welcome from the Co-Chairs: Introduction to Day 3 
10:05 AM EDT/ 16:05 CEST Speaker 1: Shunji Kotsuki (Chiba University) – Development of Portable Ensemble Data Assimilation Algorithm For Land, Atmosphere and, Coupled Data Assimilation
10:25 AM EDT/ 16:25 CEST Speaker 2: Yijian Zeng (University of Twente) – Impact of land model physics on estimating soil moisture and temperature with an Ensemble Transform Kalman Filter
10:45 AM EDT/ 16:45 CEST Break 
10:55 AM EDT/ 16:55 CEST Speaker 3: Kenneth J. Davis (Pennsylvania State University) – What do atmospheric inversions need from the Land DA community?
11:15 AM EDT/ 17:15 CEST Speaker 4: Michael Dietze (Boston University) – Assimilating Discrete Disturbance Events
11:35 AM EDT/ 17:35 CEST Speaker 5: Clara Draper (NOAA OAR ESRL PSL) – Generating ensembles for ensemble-based soil moisture data assimilation 
11:50 AM EDT/ 17:50 CEST Q&A 
11:55 AM EDT/ 17:55 CEST  Break
12:00 PM EDT/ 18:00 CEST  Breakout Group Report Backs and Plenary Discussion
1:00 PM EDT/ 19:00 CEST END


Abstracts for Day 1: Machine Learning in Land DA

Use of advanced machine learning for improved exploitation of remote sensing information
Sujay Kumar(1), Shahryar Ahmad(1), Goutam Konapala(1), Clara Draper(2)
(2) NOAA
Abstract: Given the significant heterogeneity and complexity of the land surface, there are significant barriers to fully exploiting the information content of remote sensing datasets. While there has been significant progress in the use of land data assimilation methods, majority of them still rely on the use of retrieval products to incorporate them within land surface models. The reliance on retrieval model products, which have their own associated biases and uncertainties has been limiting. In soil moisture data assimilation instances, for example, remote sensing retrievals are often rescaled to match the climatology of the model because of the large scale systematic differences between the model and remote sensing retrieval estimates. These rescaling approaches lead to loss of information and are inadequate in handling dynamic changes in bias characteristics, and when unmodeled processes are present. The use of optimization tools to reduce the systematic errors in the model, therefore, is desirable. The traditional calibration approaches, however, are computationally expensive, limiting their application over large/fine spatial scales. Here we demonstrate the use of advanced machine learning tools for the effective reduction of systematic errors in a computationally efficient manner. The presentation will also discuss how the use of machine learning is impactful in improving the information content of retrieval products. For example, though there has been a long legacy of passive microwave radiometry for snow mass estimation, most of the retrievals are fraught with issues of limited skill over mountains and forests and insufficient interannual variability. The use of advanced machine learning tools is more effective in exploiting the relative sensitivities in radiance measurements for improving these limitations.  The presentation will also describe how the machine learning applications provide inferences on improving model representations.

Assimilating ASCAT dynamic vegetation parameters to constrain the plant water dynamics in land surface model
Xu Shan(1), Susan Steele-Dunne(1), Manuel Huber(2), Sebastian Hahn(1), Wolfgang Wagner(1,3), Bertrand Bonan(4), Clement Albergel(4), Jean-Christophe Calvet(4), Ou Ku(5), Sonja Georgievska(5)
(1) Department of Geoscience and Remote Sensing, Faculty of Civil Engineering and Geosciences, TU Delft, Delft, the Netherlands
(2) Department of Water Management, Faculty of Civil Engineering and Geosciences, TU Delft, Delft, the Netherlands; now at European Space Agency, European Space Research and Technology Centre (ESTEC), 2201 AZ, Noordwijk, the Netherlands
(3) Department of Geodesy and Geoinformation (GEO), Vienna University of Technology, Vienna, Austria
(4) CNRM, Université de Toulouse, Météo-France, CNRS, Toulouse, France; now at European Space Agency Climate Office, ECSAT, Harwell Campus, Didcot, Oxfordshire, UK;
(5) Netherlands eScience Center, Amsterdam, the Netherlands
Abstract: Our current ability to  parameterize plant water dynamics in land surface model (LSM) constrains our capacity to understand land-atmosphere processes, and our ability to represent the response of ecosystems to drought (Powell et al., 2013). Microwave remote sensing datasets contain valuable information about plant water content variations from sub-daily scale to interannual scales under saturation or water stress (Konings et al., 2017; Steele-Dunne et al., 2019) and can be assimilated to constrain plant water dynamics in LSMs.
Recent research has shown that the backscatter-incidence angle relationship of the Advanced Scatterometer (ASCAT) data varies in response to changes in vegetation water content and phenology. In this study, we are working towards assimilating these data to constrain water dynamics in a LSM. Firstly, we addressed the challenge about how to reconcile the states and parameters of the LSM with the satellite observations. A Deep Neural Network (DNN) was trained to link the ASCAT observables to the soil moisture in different layers and the vegetation-related states. Secondly, we assimilated ASCAT dynamic vegetation parameters into land surface model ISBA-A-gs.
In a study over France from 2007 to 2019, the DNN is used to simulate the normalized backscatter as well as the slope and curvature of the backscatter-incidence angle relationship. Results show that the DNN has a near zero bias for normalized backscatter and slope. A sensitivity analysis shows that ASCAT observables are sensitive to variations in not only surface soil moisture and LAI, but also root zone soil moisture because of the dependency of plant water content on soil moisture in deeper layers.
Further, assimilation results show improvement of estimates of soil moisture and LAI. Furthermore, this method is highly transferable and lends itself to multi-observation assimilation. This paves the way to constrain vegetation water processes in LSMs using all available satellite data.  

Contribution of machine learning for the integration of satellite observations in a global model of the soil-plant system
Timothée Corchia(1), Bertrand Bonan(1), Jean-Christophe Calvet(1), Gabriel Colas(1), Nemesio Rodriguez-Fernandez(2)
(1) CNRM, Université de Toulouse, Météo-France, CNRS, 31057, Toulouse, France
(2) Centre d’Etudes Spatiales de la Biosphère, CESBIO – CNESCNRSIRSUPS
Abstract: In the context of climate warming, the frequency and the intensity of extreme events such as droughts is increasing and better modeling of the response of vegetation to climate is needed. Monitoring the impact of extreme events on terrestrial surfaces involves a number of variables of the soil-plant system such as surface albedo, the soil water content and the vegetation leaf area index (LAI). These variables can be monitored by either using the unprecedented amount of data from the Earth observation satellite fleet, or using land surface models. Another solution consists in combining all available sources of information by assimilating satellite observations into models. In this work, C-band Advanced SCATterometer (ASCAT) Radar backscatter (sigma0), L-band Soil Moisture and Ocean Salinity (SMOS) Vertical and Horizontal brightness Temperatures (V and H BT) satellite products are assimilated in the ISBA land surface model of Meteo-France using the LDAS-Monde tool.  First, observation operators are built using machine learning. Neural networks (NNs) are trained using the modeled surface soil moisture (SSM), soil temperature, rainwater interception by leaves, and satellite-derived LAI observations from Copernicus as inputs. The NNs are then used to find the statistical relationship between the input data and the satellite products, making LDAS-Monde capable of assimilating the satellite observations. It is shown that the assimilation of level 1 data alone is able to markedly improve the simulated LAI and SSM.

PROcess-guided deep learning and DAta-driven modelling (PRODA) to uncover key patterns and mechanisms in global soil carbon dynamics
Feng Tao(1), Yiqi Luo(2)
(1) Department of Earth System Science, Tsinghua University, Beijing, 100084, China
(2) Center for Ecosystem Science and Society, Department of Biological Sciences, Northern Arizona University, AZ, USA
Abstract: Soils are the largest organic carbon pool in the terrestrial ecosystem. Yet, key mechanisms that regulate soil organic carbon (SOC) formation and sequestration remain poorly understood. To better understand global SOC storage and its feedback to changing climate, we developed a novel PROcess-guided deep learning and DAta-driven modelling (PRODA) approach. PRODA integrates data assimilation, deep learning, big soil carbon datasets, and process-oriented models to best represent and understand global soil carbon dynamics. In an example that integrated 52,819 globally distributed vertical SOC profiles into the Community Land Model (CLM5), PRODA-optimised model simulation explained 57% spatial variation in SOC content. Meanwhile, microbial carbon use efficiency (CUE) emerged as the pivot to global SOC storage and spatial distributions compared with other mechanisms (e.g., decomposition, plant carbon input, and vertical transport). The findings revealed by PRODA enriched the classic paradigm to focus on not only SOC decomposition and organic carbon input but also microbial CUE in understanding global SOC formation and persistence. Moreover, PRODA approach presents its potential in gaining emergent understandings of transient dynamics of SOC from integrating multiple sources of soil carbon datasets into process models. In an example at Harvard Forest, SOC sequestration from 1900 to 2010 after being constrained by both SOC content and soil radiocarbon data showed higher efficiency with lower residence time than the results only informed by the radiocarbon data. In the future, integrating process-oriented models with different sources of global soil carbon datasets is essential to accurately quantify global soil sequestration under climate change.

Comparative evaluation of different data assimilation approaches to optimize the parameters of the ORCHIDEE land surface model
Philippe Peylin(1), Nina Raoult(1), Maxime Carenso(1), Vladislav Bastrikov(1), Catherine Ottle(1), Maelle Coulon(1), James Salter(1), Cedric Bacour(1) and the ORCHIDAS group
(1) Laboratoire des Sciences du Climat et de l’Environnement (LSCE)
Abstract: For more than 10 years, different approaches have been developed by the international scientific community to optimize the parameters of biosphere models by assimilating different types of observations. These are essentially based on a Bayesian formalism with the minimisation of a cost function that takes into account all the errors associated with the model (structural errors and errors associated with the parameters) as well as the observations and our a priori knowledge of the parameters (assuming also Gaussian error distributions). Key examples include variational approaches (i.e., using a gradient method which requires the calculation of the sensitivity of the cost function to the parameters), Monte Carlo approaches (genetic algorithms, Markov chains, etc.) or “filter” approaches (i.e. Kalman filter, particle filter, etc.). Within the framework of the optimisation of the global continental surface model, ORCHIDEE, we have developed an assimilation system (ORCHIDAS) and tested mainly 2 methods (gradient method and genetic algorithm): see https://orchidas.lsce.ipsl.fr/. However, recent developments have highlighted alternative methods, based on ensemble filters or using physical model emulators, which offer advantages, particularly with regard to i) the numerical speed of the optimisation and ii) the ease of assimilating a set of observations of various natures. In this presentation, we look in particular at History Matching – a method based on emulation techniques developed by the uncertainty quantification community for the calibration of model parameters and successfully applied to climate models. We discuss the advantage of this technique (with respect to the gradient method and a genetic algorithm) and test the potential of this new approach in calibrating ORCHIDEE. The test case will consist in assimilating in situ data of water and carbon flux measurements (about 100 sites) and satellite proxies  of vegetation activity (solar-induced fluorescence – SIF) to evaluate the respective performances of the different methods: level of fit to the observations, sensitivity to local minima of the cost function, computation time, etc.

Optimizing rain gauge locations based on data-driven sparse sensor placement
Daiya Shiojiri(1), Takumi Saito(1), Mao Ouyang(1), Shunji Kotsuki(1)
(1) Center for Environmental Remote Sensing, Chiba University
Abstract: Precipitation is one of the most important variables in hydrological studies because it provides the ultimate source of water resources, and occasionally causes severe disasters. Therefore, estimating spatio-temporal distributions of precipitation has been a key challenge in hydrological studies. Rain gauge stations provide essential ground truth data that is used to calibrate satellite/ground radar observations and numerical weather prediction models. However, there have been few studies that have explored optimization methods of rain gauge locations. For the rain gauge placement in this study, we use the data-driven sparse sensor placement (SSP) method, which has been developed in informatics science. This method determines the optimal sensor locations so that the selected sensors effectively determine coefficients of proper orthogonal decomposition (POD) modes. The original SSP method reconstructs the spatial patterns of data from the selected sensors by solving a linear inverse problem using the POD modes.
This study extends the existing SSP method for the problem of rain gauge placements by incorporating singular values of POD in addition to the POD modes. We also introduce two to reconstruct spatial patterns of precipitation. One is the data assimilation approach that can estimate the spatial patterns better than the simple linear inverse problem owing to Tikhonov regularization. The other implementation is the localization for eliminating erroneous sampling noise and increasing the rank of background error covariance. We applied the proposed method for the placements of rain gauge observations over Hokkaido Island in Japan. Here we used 14-day accumulated precipitation of radar observations from 2006 to 2016 as training data. Based on the POD modes of the training data, observation locations were determined. We estimate spatial patterns of precipitation from selected points of precipitation by data assimilation, and compared with reference radar data for 2017-2018. The optimized locations of rain gauge stations by SSP method reconstruct more accurate spatial patterns of precipitation than the fields reconstructed with operationally distributed rain gauge locations.

Abstracts for Day 2: Novel Observations and Approaches

Assessing the complementarity of multiple datasets in constraining model estimates of net and gross global C budgets within a data assimilation framework
Cédric Bacour(1), Natasha MacBean(2), Philippe Peylin(1), Frédéric Chevallier(1)
(1) LSCE, France
(2) Department of Geography – Indiana University Bloomington, USA
Abstract: Over the past decade, application of data assimilation – DA – techniques has become a key component of land surface modelling. DA does not only enable improving the parameterization of terrestrial biosphere models (TBMs) but can also help pinpointing some of their deficiencies. When earlier DA works mostly assimilated only one data-stream, the benefit and challenges of assimilating multiple datasets had to be explored. Indeed, a greater number and diversity of observations should provide stronger constraints on model parameters, including a wider range of processes,hence further reducing model uncertainty. However, a major challenge in the joint assimilation of multiple data-streams concerns the inconsistencies between observations and model outputs, which are usually not accounted for in common “”bias-blind”” Bayesian DA systems relying on the hypothesis of Gaussian errors. The likely impact of model-data biases on the parameter optimization is a degraded model performance as well as an illusory decrease in the estimated model uncertainty.
In this study, we illustrate the challenges of assimilating simultaneous multiple datasets related to the carbon cycle within the ORCHIDAS assimilation system associated with the ORCHIDEE TBM: net ecosystem carbon exchange and latent heat fluxes measured at eddy covariance sites across different ecosystems, satellite derived Normalised Difference Vegetation Index and monthly atmospheric CO2 concentration data measured at surface stations. To address the question of the compatibility between the data-streams, we conducted diverse assimilation experiments in which the different data-streams were assimilated alone or together. Hindcasts performed with these different calibrated models enabled us to quantify the relative model improvement with respect to each data-stream, and to identify whether a given dataset complements or contradicts the other data within the DA system and the ORCHIDEE model structure. We also present statistical diagnostics that were applied to check the consistency of the prior errors on model parameters and observations, and the information content brought by each individual data stream within the joint assimilation framework.

Estimating spatially and temporally varying parameters of Earth system models with data assimilation and deep learning
Yiqi Luo(1)
(1) Northern Arizona University
Abstract: Earth system models (ESMs) generate great uncertainty partly because ESMs use constant parameters.  More and more evidence show that model parameter values must vary over time and space to realistically simulate ecosystem dynamics well. Indeed, parameter values that are estimated using data assimilation vary with sites and treatments in global change experiments. Varying parameters are to account for both processes at unresolved scales and changing properties of evolving systems. A model, no matter how complex it is, could not represent all the processes of one system at resolved scales. Interactions of processes at unresolved scales with those at resolved scales should be reflected in model parameters. Meanwhile, it is pervasively observed that properties of ecosystems change over time, space, and environmental conditions. Parameters, which represent properties of a system under study, should change as well. Data assimilation estimates parameter values at individual sites. The site-level estimates of parameters have to be upscaled by a deep learning model to predict spatially heterogeneous parameters at regional and global scales so that modelled and observed ecosystem dynamics are maximally matched.

Resolving the carbon-climate feedback potential of high latitude wetland CO2 and CH4 exchanges
Shuang Ma(1), A. Anthony Bloom(1), Gregory R. Quetin(2), Jennifer D. Watts(3), Zona Donatella(4), Eugenie Euskirchen(5), Alexander J. Norton(1), Yin Yi(6), Paul A. Levine(1), Nicholas C. Parazoo(1), John R. Worden(1), Charles E. Miller(1), David S. Schimel(1)
(1) Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, USA
(2) Department of Geography, University of California, Santa Barbara, USA
(3) Woodwell Climate Research Center, Falmouth, MA, USA
(4) San Diego State University, San Diego, CA, USA
(5) Institute of Arctic Biology, University of Alaska Fairbanks, Fairbanks, AK, USA
(6) Division of Geological and Planetary Sciences, California Institute of Technology, Pasadena, CA, USA.
High latitude wetlands are key stores of organic carbon (C), and play a major role in the greenhouse gas balance of high-latitude ecosystems. The carbon-climate feedback potential of high latitude wetlands remains poorly understood, not least due to uncertainty on competing temperature and precipitation controls on CO2 and CH4 carbon-dioxide (CO2) uptake, and decomposition of soil C into CO2 and methane (CH4) fluxes. In particular, while CH4 fluxes typically account for a smaller component of the C balance, the climatic impact of CH4 outweighs CO2, given its 28-34 times larger Global Warming Potential (GWP) on a 100 years scale, highlighting the need to jointly resolve the climatic sensitivities of both CO2 and CH4. To quantitatively assess the carbon-climate feedback potential of wetland ecosystems, we developed a simple Joint-CO2-CH4-Respiration scheme (JCR) in a terrestrial biosphere model (DALEC) and used a data-model integration approach (CARbon Data Model fraMework, CARDAMOM) to produce a data-constrained analysis of environmental controls of carbon exchange and its sensitivity to inter-annual variations and trends in climate change at seven high-latitude wetland eddy covariance sites. The observation-optimized model accurately represents seasonal and inter-annual variability of CH4 and CO2 fluxes. Based on observation-constrained model processes, we perturb meteorological forcings to quantify the sensitivity of CH4 and CO2 fluxes to potential inter-annual variations and trends in precipitation and temperature. Overall, we find that (i) precipitation, rather than temperature, dominates the NEE and CH4 sensitivities to climate through soil moisture, and (ii) the sign of the GWP response reversed depending on precipitation levels in warming scenarios. A warmer and drier climate may decrease total GWP by 0.01 ± 0.02 gCO2/m2/day, and a warmer and wetter climate increase GWP by 0.05 ± 0.03 gCO2/m2/day in these high-latitude wetland ecosystems. We demonstrate joint observational constraints on CO2 and CH4, which is the key to understanding high-latitude ecosystem responses in the coming decades, and highlights the need to reduce uncertainty on both (a) CO2 and CH4 biogeochemistry, (b) climatic changes in coming decades, to improve assessment of wetland carbon-climate feedback potential.

Gaussian process emulators for efficient Bayesian calibration of process-based models
Paul A. Levine(1), A. Anthony Bloom(1), Alexandra G. Konings(2), Matthew Worden(2), Shuang Ma(1), Renato Braghiere(2), Alexander Norton(2), Nicholas Parazoo(2)
(1) Jet Propulsion Laboratory, California Institute of Technology
(2) Stanford University, Department of Earth System Science
Abstract: Bayesian calibration allows informing land surface models (LSMs) with data from multiple sources and scales, iteratively updating analyses as new data become available, propagating uncertainty into model predictions, and dealing with complex systems. While the primary aim of the calibration is constraining uncertainties in the model parameters, associated analyses help identify missing processes, feedback mechanisms or state variables.
The traditional Bayesian calibration algorithms, however, fail to leverage high-performance computing environments that are optimized for parallel computation and advances in computing power that are increasingly being made in terms of number of processors rather than CPU speed. This is more than an inconvenience where most LSMs are simply too slow to be plugged into these algorithms that require thousands to millions of sequential model evaluations.
To overcome this challenge we established an emulator-based Bayesian calibration framework where the emulator, that is orders of magnitude faster than the original computer simulator, is used in place of the full model and passed to Bayesian calibration algorithm. In this approach time limiting steps of running the full model are reduced and parallelized.
We use the Gaussian process(GP) model as our statistical emulator where GP always passes exactly through the design points, and allows for the estimation of uncertainties associated with interpolation in between design points. Key features  of this approach involve emulating the error surface instead of model outputs, proposing and refining training points strategically, and modifying the calibration algorithm to accommodate for the uncertainty in GP.
The gains in terms of computation time using the emulator-based calibration are shown to be substantial with opportunities to explore more complex statistical models at the hierarchical level. We generalized and implemented the emulator-based Bayesian calibration and multi-site hierarchical Bayesian calibration work flows as part of an ecological informatics toolbox, PEcAn, where we make use of distributed architecture that facilitates community collaboration. We also discuss current limitations of the approach as well as potential solutions and more advanced applications that are under progress.

Abstracts for Day 3: Ensemble DA methods

Development of Portable Ensemble Data Assimilation Algorithm For Land, Atmosphere, and Coupled Data Assimilation
Shunji Kotsuki(1)
(1) Chiba University
The ensemble Kalman filter (EnKF) is an advanced data assimilation method using the flow-dependent forecast error covariance estimated by an ensemble of model forecasts. Among various kinds of EnKFs, the ensemble transform Kalman filter (ETKF; Bishop et al 2001; Hunt et al 2007) is an efficient method for parallel computations, and has been widely used for Earth system models such as for atmosphere, ocean, and land surface models. Our group has been developing an ETKF-based data assimilation algorithm (https://github.com/skotsuki/speedy-lpf). In addition to the classical ETKF, this algorithm incorporates a local particle filter and its Gaussian Mixture extension (Kotsuki et al. 2022), and hybrid background error covariance model (Kotsuki and Bishop 2022) as the form of ETKF. We have been developing this ETKF-based algorithm for the global atmospheric data assimilation system known as NEXRA, which is currently running operationally on the JAXA’s third-generation supercomputing system. The ETKF-based algorithm has an additional advantage in representing weakly-coupled or strongly-coupled land-atmosphere data assimilation easily by regulating the ensemble transform matrix for land and atmospheric components.
This talk introduces our ensemble data assimilation algorithm and its developmental concept. We also show its applications to global atmospheric and land-atmospheric data assimilation experiments. For example, we are exploring the optimal coupled land-atmosphere data assimilation method in the NEXRA for improving weather and hydrological forecasts by assimilating soil moisture data. We found that updating atmospheric variables by assimilating soil moisture data improves soil moisture analysis and forecasts and mitigates a warm temperature bias in the lower troposphere where a dry soil moisture bias exists. However, updating soil moisture by assimilating atmospheric observations has detrimental impacts on soil moisture analysis and forecasts. This talk also introduces our recent work on land data assimilation of satellite-sensed land surface temperature for a Japanese land surface model SiBUC.

Impact of land model physics on estimating soil moisture and temperature with an Ensemble Transform Kalman Filter
Yijian Zeng(1), Bob Su(1)
(1) University of Twente
The paper introduces STEMMUS (i.e. the model considering coupled liquid, vapor, dry air and heat transport in soil) together with a data assimilation platform, to check how different model complexities can affect the model performance in estimating soil moisture and soil temperature in an arid environment, for a sub-weekly time period. The different model complexities were achieved by including or excluding different coupling mechanisms in the STEMMUS, for example, the diffusion-based mechanism (DM), the coupled moisture and heat transport mechanism through the inclusion of vapor flow (DMV) and the comprehensively coupled mechanism including dry airflow (DMVA). The results show that the model physics does not play a great role in affecting soil moisture estimations, when the soil moisture observation is dense. Even for sparse soil moisture observation (>6hr observational interval), there is no obvious advantage of either complex or simple model in estimating soil moisture in the data assimilation system. However, the designing of the observation interval, at which the observed soil moisture data will be assimilated, is deemed important in affecting the data assimilation result of soil moisture, especially when the soil experiences wetting-drying cycles. The earlier assimilation of the soil moisture responses to such cycle will lead to better estimations. For soil temperature, different model complexities do play a role in affecting the data assimilation results. The complex model performs better than the simple model in estimating soil temperature. The simple model cannot constrain the soil temperature dynamics at deeper layer when the observation of soil temperature is limited. 

What do atmospheric inversions need from the Land DA community?
Kenneth J. Davis(1)
(1) The Pennsylvania State University
Abstract: Atmospheric inversions require prior flux estimates.  Land data assimilation systems can provide these prior flux estimates.  What are some of the features that could be provided by the land DA community that would make the operators of atmospheric inversion systems smile?  I will provide a review of our group’s efforts at land DA aimed toward improving prior flux estimates for atmospheric inversions.  I will also present a “wish list” from the atmospheric inversion community.  My aim is to open a discussion that will increase collaboration between the land DA and atmospheric inversion communities.

Assimilating Discrete Disturbance Events
Michael Dietze(1)
(1) Boston University
Abstract: Current approaches to bottom-up disturbance monitoring rely heavily on the detection of land-use, land-use change, and forestry (LULUCF) through remote sensing, but often account for ecosystem impacts using simple look-up tables. By contrast, process models are frequently used to analyze and predict disturbance dynamics in greater detail. Once observations are available, however, we need to update predictions, especially for stochastic processes such as disturbance. State data assimilation (SDA) is designed specifically to update predictions, nudging modeled states back toward reality in proportion to the uncertainties in the model and the data, but current SDA algorithms are designed to update continuous states, not discrete disturbances. Here we develop a new Bayesian SDA algorithm that combines a discrete Multinomial state-and-transition framework with conventional ensemble filtering SDA approaches. To demonstrate the potential for assimilating disturbance, we applied the Multinomial SDA to the Very Simple Ecosystem Model (VSEM), performing both simulated data experiments with known disturbances and testing the algorithm against real-world disturbances detected in the LandTrender data product for central Oregon.
With simulated disturbance we demonstrate the ability to not only detect discrete disturbance events but also avoid false positives. We also demonstrated the ability to fuse multiple data types to successfully distinguish different disturbance types, and to probabilistically capture vegetation type ‘switching’ events within the assimilation and ensemble forecast. To apply this to real-world data we calibrated VSEM against eddy-covariance and ancillary data from the Ameriflux US-Me2 tower. We then selected 356 conifer forest sites for testing, using the Landtrendr disturbance product to stratify by four disturbance types (cut, burn, pest, and other). We then assimilated the 30m Landtrendr annual aboveground biomass product from 1990-2017 and assessed the rate of disturbance detection. Assimilating just AGB, our assimilation was sensitive to disturbances that reduced biomass by 1.5 kg/m2 but underpredicted defoliation disturbances, which we expect would be improved by also assimilating LAI. Moving forward, the SDA framework provides an exciting opportunity to fuse multiple data sources to holistically improve real-time disturbance detection, impact assessment (e.g. carbon sequestration), and forecasts of both disturbance events and post-disturbance recovery within a single integrated system.

Generating ensembles for ensemble-based soil moisture data assimilation
Clara Draper(1)
Abstract: This presentation will review aspects related to the generation of ensembles for ensemble-based land data assimilation, for both offline and coupled land/atmosphere systems, using examples drawn from the development of a new land data assimilation system for the NOAA National Centers for Environmental Prediction (NCEP) global data assimilation and numerical weather prediction (NWP) system.   Several different schemes for perturbing the soil (moisture and temperature) states in NCEP’s cycling NWP / data assimilation system have been tested, starting with the approaches used in offline land data assimilation systems.   Offline systems typically account for model uncertainty in ensembles by perturbing a selection of atmospheric forcing and model state variables. In most cases, the perturbed atmospheric forcing is generated by adding statistically-generated perturbations to a single atmospheric realization (say from a model forecast or observations, or a combination of both). However, most atmospheric reanalysis and NWP systems are now ensemble-based, and ensembles of forecasts from different atmospheric realizations are now available. While the atmospheric fields used to force land models are generally under-dispersed in these ensembles, it is beneficial to use these fields in place of perturbing a single atmospheric realization, since this ensures internal consistency between the atmospheric variables in each ensemble member, while also providing more accurate spatial variation in the model forcing uncertainty. It is also shown that adding perturbations to the soil moisture states, as is often done in offline systems, generates unrealistic spatial patterns in the resulting ensemble spread.  By contrast, perturbing the land model parameters, in this case vegetation fraction, generates a more realistic distribution in the ensemble spread, while also inducing perturbations in the land and atmosphere that are consistent with errors in the land/atmosphere fluxes. The latter is important since it leads to ensemble error cross-covariances that reflect the uncertainty the fluxes that determine the land/atmosphere coupling. By contrast, perturbation methods that target only one component (say adding perturbations to either the atmospheric or land states) will lead to overestimated ensemble error covariances where that component is driving the coupling between the components, and underestimated covariances where the other component is driving the coupling.