EMBL Alumna Leverages Data Skills for AI-Powered Research

Artificial intelligence is rapidly reshaping the landscape of scientific discovery, promising to accelerate breakthroughs across life sciences—and skilled data management is proving essential to unlocking its full potential. EMBL alumna Laura Clarke exemplifies this intersection, having transitioned her expertise from large-scale genomic projects at the European Bioinformatics Institute to a pioneering role at BenchSci. There, she’s applying data coordination skills to develop AI-powered tools that streamline research, helping scientists navigate complex data and accelerate the pace of discovery. Clarke shares how her EMBL experience laid the foundation for this work, and what the scientific community can do to foster the next generation of AI-driven innovation.

Laura Clarke’s Current Role and BenchSci’s AI Platform

Laura Clarke is currently a Senior Project Manager at BenchSci, a company leveraging artificial intelligence to accelerate life science research and development. BenchSci’s core product, ASCEND, utilizes machine learning to analyze scientific publications – identifying crucial elements like proteins, diseases, and antibodies – across all research stages. This AI-powered platform helps scientists pinpoint optimal methods and reagents, ultimately streamlining experiment design and boosting R&D efficiency by surfacing contextual evidence and predicting potential risks.

Clarke’s background at EMBL-EBI, where she managed large-scale projects like the 1000 Genomes Project and the Human Cell Atlas, was foundational to her current role. She honed skills in data management and coordination, learning how scientists approach data and the types of information they require. This experience directly informs her work at BenchSci, where she bridges the gap between bioinformatics/machine learning teams and user needs, ensuring the AI platform is both technically robust and practically relevant to scientific workflows.

Open access data is critical for BenchSci’s AI development, and the company extensively uses resources like Ensembl and UniProt to enrich its platform. While proprietary databases and publisher contracts supplement this, publicly-labelled data – detailing proteins, pathways, and sequences – is invaluable for “bootstrapping” innovation and demonstrating a concept’s potential without significant upfront investment. Clarke emphasizes the need for trustworthiness in AI, particularly as it expands into diagnostics, and is excited about the potential for AI to propose new research hypotheses, accelerating discovery beyond simply uncovering existing knowledge.

EMBL Experience in Data Management and Large-Scale Projects

EMBL’s experience with large-scale genomic projects – like the 1000 Genomes Project, Human Cell Atlas, and HipSci – has been foundational for advancements in data-driven research. These initiatives generated petabytes of data requiring robust management strategies, encompassing sample tracking, phenotype standardization, and ontology development. This expertise isn’t confined to genomics; the lessons learned in coordinating these complex datasets directly translate to other ‘omics’ fields and are vital for training the machine learning algorithms powering modern AI applications.

A key takeaway from EMBL’s work is the critical importance of open access data for AI development. Machine learning thrives on large, labelled datasets, and publicly available resources like Ensembl and UniProt serve as essential seed data. This approach reduces initial investment and accelerates innovation, allowing researchers to validate concepts before substantial funding is committed. EMBL’s commitment to data sharing directly facilitates the bootstrapping of AI initiatives across life science research.

EMBL alumni, like Laura Clarke, demonstrate how expertise in data management bridges the gap between research and application. Clarke’s experience coordinating large projects provided valuable insight into the needs of both academic and industry scientists, enabling her to contribute to the development of AI platforms like BenchSci’s ASCEND. This platform leverages machine learning to accelerate R&D, highlighting how EMBL’s focus on data infrastructure fosters impactful translation of scientific discovery.

The Importance of Open Data for AI Advancement

Open data is foundational to advancing artificial intelligence, particularly in fields like biomedicine. Machine learning algorithms require vast datasets for effective training; without freely accessible, labelled data – identifying proteins, pathways, or genomic sequences – innovation is severely hampered. BenchSci, for example, leverages resources like Ensembl and UniProt to connect information, demonstrating the value of publicly available archives. This access allows both academic and commercial entities to “bootstrap” ideas and demonstrate potential before significant funding commitments, accelerating the pace of discovery.

Large-scale projects like the Human Cell Atlas (HCA) and the 1000 Genomes Project are prime examples of how open data fuels AI development. These initiatives generate massive, complex datasets that, when openly shared, become invaluable training resources. Professionals skilled in data management—like EMBL alumna Laura Clarke—are crucial in coordinating these efforts and ensuring data quality. Their expertise bridges the gap between data generation and the development of effective AI applications in areas such as diagnostics and therapeutic research.

A critical challenge hindering AI progress isn’t just access to data, but also building trust in AI-driven insights. While AI excels at pattern recognition and prediction, it lacks inherent understanding. Ensuring public awareness of these limitations, especially as AI moves into sensitive fields like healthcare, is vital. The future lies in leveraging AI’s speed to accelerate in silico experimentation, helping scientists narrow research focus and increase lab efficiency, but this requires responsible development and transparent data practices.

Future Potential and Challenges of AI in Research

AI is rapidly transforming research, particularly in life sciences, by leveraging openly available “big data.” Companies like BenchSci are building platforms—like ASCEND—that utilize machine learning to analyze scientific publications, identifying key elements like proteins and antibodies. This accelerates research by helping scientists pinpoint optimal methods and reagents, streamlining experiment design. The ability to process vast datasets – including resources like Ensembl and UniProt – allows for quicker knowledge discovery and a more efficient research pipeline, moving beyond simple data retrieval to contextual analysis.

A critical factor enabling AI’s progress is open access to data. Machine learning algorithms require extensive, labelled datasets for training. Publicly available information—identifying proteins, pathways, or gene sequences—is invaluable, saving time and resources otherwise spent on curation. BenchSci heavily utilizes these resources, alongside proprietary databases, to power its AI platform. This collaborative approach allows both academic and industrial researchers to build upon existing knowledge, validating ideas before significant investment.

Despite the promise, building trustworthy AI remains a major hurdle. It’s crucial to understand that AI, even advanced large language models, excels at pattern recognition – not human-like reasoning. As AI moves into sensitive areas like diagnostics, addressing bias and ensuring accuracy are paramount. Future excitement lies in moving beyond finding existing knowledge to AI generating novel hypotheses, dramatically accelerating in silico experimentation and potentially revolutionizing the pace of scientific discovery.

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

Toyota & ORCA Achieve 80% Compute Time Reduction Using Quantum Reservoir Computing

Toyota & ORCA Achieve 80% Compute Time Reduction Using Quantum Reservoir Computing

January 14, 2026
GlobalFoundries Acquires Synopsys’ Processor IP to Accelerate Physical AI

GlobalFoundries Acquires Synopsys’ Processor IP to Accelerate Physical AI

January 14, 2026
Fujitsu & Toyota Systems Accelerate Automotive Design 20x with Quantum-Inspired AI

Fujitsu & Toyota Systems Accelerate Automotive Design 20x with Quantum-Inspired AI

January 14, 2026