Soutenance de thèse : Decoding chaos : a cryo-EM walk through Physarum polycephalum heterogeneous cell extracts

Localisation

Salle des séminaires IBS

Par Eymeline Pageot (IBS/Groupe Imagerie microscopique d’assemblages complexes)

Cryo-electron microscopy (cryo-EM) has become a central method in structural biology for determining high-resolution structures of macromolecular assemblies. However, most cryo-EM studies still rely on highly purified samples, which limits the range of biological systems that can be investigated. In recent years, the analysis of cell extracts by single particle cryo-EM has emerged as an alternative strategy to explore the structural diversity of cellular proteins directly from complex mixtures. Despite its potential, this approach remains technically challenging due to the high level of heterogeneity present in such samples and the difficulty of identifying proteins without complementary techniques such as mass spectrometry.

The work presented in this thesis investigates the feasibility of using cryo-EM as a primary discovery tool to determine and identify protein structures directly from fractionated cell extracts. The study focuses on the plasmodial slime mould Physarum polycephalum, a non-model eukaryotic organism whose structural proteome is largely unexplored. In contrast to most previous studies in the field, protein identification relied solely on structural information derived from cryo-EM reconstructions.

Cell extracts from P. polycephalum were fractionated using biochemical separation methods and analysed by cryo-EM. Because such samples contain a wide range of proteins with different sizes, shapes, and abundances, dedicated image processing strategies were developed to address the challenges posed by particle heterogeneity and flexibility. Particular attention was given to 2D classification, particle sorting, and iterative particle picking strategies to improve the recovery of multiple particle orientations and increase the number of particles contributing to 3D reconstructions.

Using these approaches, fourteen macromolecular assemblies were identified and solved directly from the heterogeneous samples. In cases where the cryo-EM maps reached sufficient resolution, atomic models could be built ab initio. For lower-resolution structures, identification relied on structural comparison with known protein folds and structures predicted by generative tools. This work demonstrates that protein identity and putative function can be inferred directly from the structural information, opening the possibility of structure-based genome annotation in poorly characterised organisms.

Beyond the individual structures determined in this study, this work also outlines the methodological challenges associated with large-scale structural exploration of complex biological mixtures. The largely manual nature of several steps in the workflow currently limits the throughput of the approach. Strategies to automate key stages of the pipeline, including particle classification, initial model generation, and structural identification, are therefore discussed as important directions for future development.

Finally, this work highlights the importance of community-driven analysis and open data sharing for the continued development of large-scale structural exploration approaches. Making both the raw cryo-EM data and the solved structures publicly available enables other researchers to further investigate the dataset, potentially identifying additional structures and improving the analysis using new computational methods, as well as enabling the development of the methods themselves. In the longer term, combining such "shotgun" cryo-EM approaches with advances in computational modelling and high-throughput data analysis may contribute to the systematic structural characterisation of cellular proteomes.