Valentin Velev

MSc Data Science Student | Researcher | Aspiring Data Scientist

Projects

AL-RAG: A local RAG for academic literature

In this project, I will try and create a local retrieval augmented generation (RAG) app for academic literature. RAG was first introduced in Lewis et al. (2020) and can be summarized as an approach or a technique that leverages large language models (LLMs) to effeciently retrieve and synthesize factual information in a (local) database while minimizing hallucinations. In other words, it is a tool that allows the user to search for or synthesize text across many text files (e.g., PDF files) using the chat feature of a LLM (think of it like having a local ChatGPT but ChatGPT's knowledge base is limited to your text files). It involves vectorizing text documents (e.g., using Sentence-BERT), indexing them (e.g., using FAISS) and then using a LLM with a chat feature (e.g., LLaMA 3).

RoBERTa-BG: A RoBERTa-based state-of-the-art classification model for Bulgarian text

In a recent NLP conference in Varna, Bulgaria, three Bulgarian computer scientists presented two BERT- and GPT-based text classification models for Bulgarian that outperformed previous classifiers for Bulgarian.

Two sides of the same coin? The electorates of the GRÜNE Schweiz and the Grünliberale Partei Schweiz

Are the electorates of GRÜNE Schweiz and the Grünliberale Partei Schweiz similar? How do they differ? These are two of the questions Lukas Rudolph (University of Konstanz) and I will answer in this project.

The efficacy of synthetic data for market research: A GPT-2 and LLaMA 3 approach

Description of the project.