internship - Data Referential - Exploring LLM and RAG to retrieve internal data
Paris, 75, FR
ABOUT CFM
Founded in 1991, we are a global quantitative and systematic asset management firm applying a scientific approach to finance to develop alternative investment strategies that create value for our clients.
We value innovation, dedication, collaboration, and the ability to make an impact. Together, we create a stimulating environment for talented and passionate experts in research, technology, and business to explore new ideas and challenge existing assumptions.
Exploring LLM and RAG for enhanced internal data accessibility and visualization
Internship Overview:
Join our data-referential team as we explore the cutting-edge technologies of Large Language Models (LLMs) and Retrieval-
Augmented Generation (RAG) to revolutionize the way we access and interact with our internal data. This internship offers a unique opportunity to work on
developing a system that allows users to query our data referential intuitively through natural language prompts and visualize results through simple
graphs.
Key Responsibilities:
Collaborate with our data and development teams to understand the existing data models and requirements.
Implement and integrate an LLM to interpret and respond to natural language prompts.
Develop a RAG system to enhance the relevance and accuracy of the information retrieved from our data repositories.
Create a user-friendly interface for querying and visualizing data.
Utilize data visualization libraries to generate simple, informative graphs based on user queries.
Test and refine the system to ensure accuracy, performance, and user satisfaction.
Document processes, workflows, and findings throughout the project.
Desired Skills and Qualifications:
Familiarity with large language models and natural language processing. We are seriously thinking in using LangChain for this internship.
Experience with Python and relevant libraries (e.g., Matplotlib, Seaborn).
Knowledge of SQL querying language.
Knowledge of search and retrieval systems (e.g., Elasticsearch) is a nice to have.
Strong problem-solving skills and the ability to work independently as well as collaboratively.
Proficiency in web development frameworks and tools (e.g., Flask, React) is a nice to have, but we also value knowledge in prototyping tools like
Streamlit.
Eagerness to learn and adapt to new technologies and challenges.
Learning Outcomes:
Gain hands-on experience with state-of-the-art AI and NLP technologies.
Enhance your skills in data retrieval, processing, and visualization.
Improve your ability to design and implement user-centered data solutions.
Contribute to a project with real impact for our team and the users of our data.
Useful links:
https://python.langchain.com/docs/tutorials/sql_qa/
https://medium.com/dataherald/high-accuracy-text-to-sql-with-langchain-840742133b83
https://medium.com/@rehmana.younis1/title-building-an-advanced-log-analyzer-chatbot-with-llms-rag-and-streamlit-8b8a203487c0
EQUAL OPPORTUNITIES STATEMENT
We are continuously striving to be an equal opportunity employer and we prohibit any discrimination based on sex, disability, origin, sexual orientation, gender identity, age, race, or religion. We believe that our diversity, breadth of experience, and multiple points of view are among the leading factors in our success.
CFM is a signatory of the Women Empowerment Principles.
FOLLOW US
Follow us on Twitter or LinkedIn or visit our website to find out more about CFM.