Security-Aware Code Generation and Evaluation
Large Language Models are reshaping software development, but their ability to generate secure code remains an open challenge. SecureCoder is a collaborative research initiative at TU Darmstadt that tackles this problem end-to-end: from curating security-focused training data and building robust evaluation benchmarks, to training security-aware models and deploying constraint-based decoding techniques that prevent vulnerabilities at generation time.
Our work brings together expertise in programming languages, natural language processing, and machine learning to produce developer-friendly tools, including an IDE plugin and a unified benchmarking CLI, that raise the bar for safe AI-assisted development.
Four interconnected pillars of security-aware code generation
High-quality security data is the foundation of everything we do. We design systematic pipelines to collect, annotate, and curate code samples that exhibit real-world vulnerabilities, covering CWE classes such as buffer overflows, SQL injection, path traversal, and more. By building standardized, reproducible datasets we enable rigorous cross-model comparison and reproducible research across the community.
We develop training regimes that embed security awareness directly into language models. This includes fine-tuning with symbolic execution feedback to identify vulnerabilities at training time, GNN-augmented pre-training that models code graph structure for improved accuracy, and obfuscation grounding to enhance model robustness against adversarial or insecure patterns.
Rather than relying solely on post-hoc filtering, we integrate security constraints directly into the decoding process. A dedicated discriminator model critiques generated code in real time, steering the generator away from known vulnerability patterns. This approach combines security-first prompting with formal verification principles to guarantee structural safety properties during generation.
Our research is deployed as practical tools: a JetBrains IDE plugin that highlights vulnerabilities and suggests secure alternatives in real time, and a unified CLI benchmarking framework for evaluating any model against industry-standard security suites including CWEval, SecCodePLT, and CyberSecEval. Both tools are openly released for use by the broader research community.
Open tools built from our research, ready to use in your workflow
Brings security awareness directly into your development environment. The plugin analyzes code suggestions in real time, highlights potential vulnerabilities, and proposes secure alternatives, powered by our constraint decoding and security-first prompting techniques.
A unified command-line interface for rigorously evaluating the security posture of code generation models. Benchmark any model from OpenRouter or a local agent against industry-standard security suites including CWEval, SecCodePLT, and CyberSecEval , all in one tool.
Research outputs from the SecureCoder project
Loading publications from BibTeX...
Principal investigators and researchers driving the SecureCoder initiative
Programming languages, software engineering, and AI-assisted development with a focus on code correctness and security.
Natural language processing, information extraction, and large language model research for understanding and generating text and code.
Machine learning, statistical relational AI, and deep learning with applications to structured prediction and reasoning.