Valentin Velev

MSc Data Science | Agentic AI Developer | NLP Researcher

Projects

RAG4AL: A local RAG for academic literature

This project is about creating a local retrieval augmented generation (RAG) application for academic literature. RAG was first introduced in Lewis et al. (2020) and can be summarized as an approach or a technique that leverages large language models (LLMs) to effeciently retrieve and synthesize factual information in a (local) database while minimizing hallucinations. In other words, it is a tool that allows the user to search for or synthesize text across many text files (e.g., PDF files) using the chat feature of a LLM (think of it like having a local ChatGPT but ChatGPT's knowledge base is limited to your text files). It involves vectorizing text documents (e.g., using Sentence-BERT), indexing them (e.g., using FAISS) and then using an instruction-tuned LLM (e.g., LLaMA 3).