Student Projects — SDE Team Development Hub

Student Projects Summary

Our cohort of students are undertaking short projects supported by the project. See the table below for a summary, followed by a selection of reflective student case studies.

Site	Course	Project title	Dates	Student name
Lancaster	MSci Computer Science	Cloud Data Engineering and Graph-Based Data Lineage for the NHS Secure Data Environment	Jan–Mar 2025	Waleed Ali
Manchester	Masters in Artificial Intelligence	AI support for statistical disclosure checks and synthetic data	June–Sept 2025	Tejal Ravikumar Yekkula
Manchester	MSc Artificial Intelligence	AI support for statistical disclosure checks and synthetic data	June–Sept 2025	Nouar Nagem
Manchester	MSc Health Data Science	Designing a secure architecture that blends on-premises and cloud based computation infrastructure for use in SDE	June–Sept 2025	Emaan Hajara
Lancaster	Msc Computer Science '25-'26	Kubernetes microservices development and integration	Jan–Mar 2025	James Gardener
Manchester	MSc Health Data Science	Hardware enabled Confidential Compute in the Health Data Science Research Domain	June–Sept 2026	Rushil Singh
Lancaster	MSc Computer Science	Fine tuning LLMs using Lora for RTT clock stop detection	Jan–Mar 2026	Hevin Patel
Lancaster	MSc Cyber Security	Securing KARECTL, A Kubernetes NHS Research Analytics Platform using Identity Based Network Policies and Policy as Code Governance	Jan–Mar 2025	Ambalika Kakoty

Student experiences

Waleed Ali

Cloud Data Engineering and Graph-Based Data Lineage for the NHS Secure Data Environment

Site:Lancaster

Course:MSci Computer Science

Biography:I'm in my final year of an MSci in Computer Science at Lancaster University. Before this placement I hadn't really touched cloud engineering tools professionally — most of my experience was through university coursework in things like data mining and distributed systems. I tend to want to understand how things work under the hood rather than just at the surface, which is partly what drew me to this role. I'm also working towards the Microsoft AZ-900 and DP-900 certifications, which the NHS offered to sponsor during the placement.

Project Summary:I spent 10 weeks working remotely as a Data Engineer Intern in the Real World Evidence team at Lancashire Teaching Hospitals, as part of the OneSDC programme — a joint NHS initiative to build a shared Secure Data Environment (SDE) for research and analytics. The placement followed a structured programme covering Azure Data Factory, Databricks with PySpark, ADLS Gen2, and GitHub. This led into a final project where I built an end-to-end ETL pipeline using a Medallion architecture (Bronze, Silver, Gold) with Synthea synthetic patient data, orchestrated through ADF with transformation logic in separate Databricks notebooks and a dashboard on top.

Tejal Ravikumar Yekkula

AI support for statistical disclosure checks and synthetic data

Site:Manchester

Course:Masters in Artificial Intelligence

Biography:I completed my undergraduate degree in AI and Machine Learning at BNM Institute of Technology, co-authoring "Face Recognition using MTCNN, Inception-ResNet with Ensemble Approach", which proposed a deep learning model combining advanced face detection with ensemble classification. My final-year project, SurgiLearnVR, a VR medical training platform, won the Best Project Award.

My journey includes internships with Oracle Financial Services Software and PwC, applying AI, machine learning, and cloud technologies to improve systems. These experiences sharpened my technical expertise while giving valuable insights into how AI can enhance efficiency and trust in both healthcare and financial systems.

I am passionate about mentorship and outreach, supporting diversity in tech initiatives, and accessibility in AI. I envision a future where AI is not only powerful but also ethical, inclusive, and accessible, and am committed to contributing to that vision.

Project Summary:My project focused on creating an AI agent that can generate synthetic healthcare data, motivated by the challenge of accessing real patient data for research, which is often restricted due to privacy concerns. By producing realistic but entirely fictional medical records, this offers researchers a safe way to test ideas, build models, and explore healthcare questions without exposing sensitive information.

I combined advanced AI language models with smart data design so the agent could respond to simple prompts like "generate 50 asthma patients with their prescribed medications." The generated datasets were then carefully compared with real ones to ensure they looked and behaved realistically while still protecting privacy. I found that AI is especially effective at capturing important clinical data e.g. age, gender, and conditions, though some hospital-related details were less precise.

The research demonstrated that AI-driven synthetic data is a powerful tool for healthcare innovation, offering a balance between usefulness and privacy. It opens the door for safer, faster research and highlights how AI can be applied responsibly in sensitive fields like medicine.

Nouar Nagem

AI support for statistical disclosure checks and synthetic data

Site:Manchester

Course:MSc Artificial Intelligence

Biography:I recently completed both a BSc and an MSc in Artificial Intelligence, building strong skills in programming, machine learning, and data engineering along the way. I enjoy learning about technology and exploring how AI can be applied in different fields, especially healthcare. Outside my studies, I developed skills in data analysis, project coordination, and stakeholder collaboration, strengthening both my technical and organisational abilities.

Project Summary:I developed an AI agent that understands natural language queries and generates synthetic healthcare data. It is designed to be capable of handling queries from uploaded datasets; supporting medical synonyms and abbreviations using Pinecone with OpenAI embeddings and NHS SNOMED terms; and of producing fully fictional, privacy-preserving patient records with the OpenAI API.

Supervision:The supervision from the SDE Team Development Hub was supportive, with weekly meetings providing guidance while allowing independent work. This project gave me hands-on experience developing a complex AI agent, interpreting natural language queries, and generating structured synthetic data. It strengthened my problem-solving skills, improved how I communicate complex ideas, and increased my confidence in pursuing a career in technical development of IT infrastructure.

Emaan Hajara

Designing a secure architecture that blends on-premises and cloud based computation infrastructure for use in SDE

Site:Manchester

Course:MSc Health Data Science

Biography:I studied for my undergraduate degree in Biological Sciences (Biotechnology with Enterprise) at the University of Leeds. I found that along the way, I realised I wanted to move into a more technical direction. This led me to pursue an MSc in Health Data Science at the University of Manchester. Here, I was introduced to the principles and approach of cloud engineering through my dissertation. That project revealed just how exciting and impactful technical infrastructure can be. I am now eager to explore roles that combine these skills and have a real-world impact in cloud infrastructure, DevOps, and data-driven research.

Project Summary:For my dissertation, I built a secure hybrid cloud architecture from scratch, linking on-premises resources with Azure to create a scalable, automated, and compliant environment for sensitive data. Having started with no prior experience in cloud engineering, I successfully designed and tested a reproducible blueprint demonstrating that hybrid models can be secure and cost-effective, and that it's possible to modernise infrastructure without compromising on governance requirements.

Supervision:My supervisors were invaluable in helping me make this leap into technical development. They guided me through unfamiliar tools like Terraform, GitHub Actions, and Visual Studio Code, helping me work through challenges, rather than just giving instructions, which in turn made me much more confident in applying the skills myself. With regular meetings and feedback, I was not only able to keep on track but was also presented with multiple opportunities to improve, whether that was my skills or the dissertation itself. Their input undoubtedly strengthened the outcome of the project.

With hands-on expertise in hybrid cloud design, CI/CD pipelines and compliance frameworks, I now feel much more confident and ready for roles in DevOps and cloud engineering and am more excited than ever to get started and apply these skills in real-world settings.

Rushil Singh

Hardware enabled Confidential Compute in the Health Data Science Research Domain

Site:Manchester

Course:MSc Health Data Science

Biography:I am currently pursuing an MSc in Health Data Science at the University of Manchester, with a background in Biotechnology and research experience spanning in fields of healthcare analytics, bioinformatics, Genomics and data science. focusing on applying data-driven approaches to solve real-world healthcare and research challenges through statistical modelling, machine learning, and scalable data workflows.

I have worked on projects involving NHS healthcare datasets, predictive modelling, dashboard development, and bioinformatics analysis using Python, Linux-based pipelines, and modern data science tools. My recent work includes developing machine learning models to predict NHS A&E breach rates, analyzing prescribing trends using large-scale healthcare datasets, and conducting genomic variant analysis for TPMT gene.

Project Summary:I chose this project because I am interested in the growing importance of secure and scalable cloud infrastructure within healthcare and health data science. As healthcare systems increasingly rely on digital records, large datasets, and collaborative research, there is a rising need for cloud environments that can securely store, manage, and process sensitive patient information while remaining accessible to authorized users.

My focus is on understanding how cloud technologies can support secure data storage, privacy, reliability, and accessibility in healthcare settings. I am particularly interested in how cloud infrastructure can enable healthcare professionals and researchers to retrieve patient data efficiently from different locations while maintaining strong security and governance standards.

Such infrastructure could further support advancements in healthcare research by enabling secure data sharing, large-scale analysis, and improved collaboration between researchers and healthcare institutions across different regions and countries.

In the future, I hope to contribute towards building secure cloud environments that support healthcare systems on a larger scale, where patient data can be safely accessed across hospitals, regions, or even internationally to improve collaboration, continuity of care, and data-driven healthcare research.

Hevin Patel

Fine tuning LLMs using Lora for RTT clock stop detection

Site:Lancaster

Course:MSc Computer Science

Biography:I am currently in the final year of my MSci degree in Computer Science at Lancaster University, where I have gained experience across a range of areas including programming, distributed systems, DevOps, artificial intelligence, and machine learning. I particularly enjoy working on projects involving generative AI to solve practical problems and I am interested in pursuing roles where I can continue learning and applying my skills in AI and emerging technologies within real-world environments.

Project Summary:During my second term, I had the opportunity to complete a 10-week placement with LTHTR, where I worked on the FastRTT project focused on developing an AI agent to detect NHS RTT clock stop events. This placement was an exceptional learning experience and gave me exposure to a wide range of real-world technologies and workflows.

As part of the project, I worked on fine-tuning different Qwen3 models using PEFT techniques and evaluating whether smaller models could achieve performance comparable to larger models for RTT clock stop detection tasks. I also gained hands-on experience with cloud-native technologies and real-world CNCF-based infrastructure, including debugging issues and deploying applications on Kubernetes clusters, and building CI/CD pipelines.

In addition, the placement gave me valuable insight into NHS operations, the projects being developed on and how technology and AI can improve efficiency within healthcare systems. Further I also had the opportunity to submit a short abstract based on my work.

Prior to this placement, I had no experience with Kubernetes, so being able to work extensively with container orchestration and deployment systems was particularly rewarding. I was also fortunate to attend my first KubeCon with the team, which further strengthened my interest in cloud-native technologies and AI-driven systems.

Ambalika Kakoty

Securing KARECTL, A Kubernetes NHS Research Analytics Platform using Identity Based Network Policies and Policy as Code Governance

Site:Lancaster

Course:MSc Cyber Security

Biography:I began my career as a Java developer and progressively advanced through roles like Software Engineer, Senior Software Engineer, and ultimately Technical Lead, accumulating over eight years of industry experience in the supply chain and e commerce domain. I have also worked in the hospitality domain during the initial days of my career. Lately before quitting my job, my work centered around microservices architecture based, Kubernetes hosted customer service applications, giving me good hands-on familiarity with the very infrastructure of the KARECTL project.

Alongside my development and issue support responsibilities, I worked closely with my organisation’s application security team, where I developed a strong interest in security and embedding it into software development. I worked in maintaining the security posture of our platform through SAST and DAST tools such as SonarQube and Burp Suite, performing security checks and ensuring OWASP Top 10 vulnerabilities were identified and mitigated across our codebase. This experience revealed that the delivery speed of software products can sometimes come at a cost to security, leaving authentication weaknesses and poor access controls to be patched after release. Thus, I became interested in bridging the gap between software development and security and building secure resilient systems from the start of the software development lifecycle itself.

This led me to the decision to pursue an MSc in Cybersecurity at Lancaster University so that I can deepen my understanding of cybersecurity at an academic level. In my spare time, I enjoy exploring intentionally vulnerable demo microservices applications such as the Weaveworks Sock Shop, reverse engineering their security failures layer by layer and drafting controls to address them. The KARECTL project is a natural alignment for me

because it combines Kubernetes based microservices, healthcare research infrastructure, and practical security challenges in a meaningful real world environment.

Project Summary:This project addresses a critical network security gap in KARECTL, the Kubernetes based Trusted Research Environment developed at Lancashire Teaching Hospitals NHS Foundation Trust. By default, Kubernetes permits unrestricted pod to pod communication across the cluster, which presents a significant security risk when handling sensitive NHS patient data, where a single compromised container could enable lateral movement and potential data exposure.

The project investigates, designs, implements, and evaluates a layered Kubernetes security architecture using Cilium, an eBPF based identity driven network security platform, together with Kyverno, a Kubernetes native policy engine for workload governance and admission control. Together, these technologies aim to strengthen network isolation, workload security, and policy enforcement across the platform.

The work will be evaluated through controlled attack simulations covering lateral movement, privilege escalation, identity spoofing, and data exfiltration, alongside observability analysis, quantitative performance benchmarking, and compliance mapping against the SATRE specification for Trusted Research Environments. Expected outputs include reusable security policy catalogues and practical recommendations to help strengthen governance and security within the KARECTL platform.