LLM & Financial Markets - Internship
Paris, 75, FR
ABOUT CFM
Founded in 1991, we are a global quantitative and systematic asset management firm applying a scientific approach to finance to develop alternative investment strategies that create value for our clients.
We value innovation, dedication, collaboration, and the ability to make an impact. Together, we create a stimulating environment for talented and passionate experts in research, technology, and business to explore new ideas and challenge existing assumptions.
LLMs & Financial Markets: Measuring What the Market Didn’t Know Yet
Objective
Qualifying novelty in news and textual information is critical for building high-performing and robust quantitative trading strategies. While widely known LLMs show excellent performance in generalist benchmarks, they fail in extremely specific setups, namely for quantitative finance, and exhibit bias due to future information. In this context, the internship will focus on pre-training and post-training of multiple point-in-time financial LLMs on SEC filings (10-K / 8-K) to measure the novelty of information in each document and explore its link with future stock returns [1].
The project focuses on:
(i) building point-in-time (PIT) language models on SEC filings (10-K / 8-K) or other financial datasets,
(ii) quantifying the novelty or surprise of new information in these filings, and
(iii) Evaluate the approach against benchmarks based on forecasting, synthetic data generation and matching with business experts’ commentary
Scope of Work
1. Data Processing
a. Explore 10-K / 8-K corporate open-source SEC filings through the BeanCounter dataset [2]
b. Study and compare this data with our internal datasets
2. LLM Training
a. Perform autoregressive model score pre-training and post-training of open-source architectures (DeepSeek, Qwen, LLaMA…) at different scales using state-of-the-art ML engineering methods (DeepSpeed, PEFT) [3]
b. Explore scaling model size and potentially establish domain-specific scaling laws (from millions to billions of parameters) [5]
c. Explore continuous pretraining, online learning and post-training (RLHF, DPO, SFT) to determine the best approaches for scaling
3. Novelty Score
a. Use LLMs to score sentences and isolate new information within each document
b. Reproduce and extend the exploration in the 10 K proposed in the literature [1]
4. Benchmark the LLM-Based Solution
a. Compare LLMs with standard statistical models (e.g., NLTK Maximum Likelihood Estimator)
b. Compare with other generic approaches using embeddings and other state-of-the-art methods (NovaScore for example [4])
5. Assessment of Extraction Quality (Core Component of the internship)
a. Synthetic dataset generation
b. Compare newly detected events with associated commentary from business experts
c. Evaluate the forecasting capability of models that use only information labeled as “new”
References
1. Costello, Anna M., Bradford Levy, and Valeri V. Nikolaev. "Representations of Investor Beliefs." Available at SSRN 5717862 (2023).
2. Wang, Siyan, and Bradford Levy. "BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text." Advances in Neural Information Processing Systems 37 (2024): 91653-91690.
3. DeepSeek-AI, DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models (2025)
4. Ai, Lin, et al. "Novascore: A new automated metric for evaluating document level novelty." Proceedings of the 31st International Conference on Computational Linguistics. 2025.
5. Hoffmann, Jordan et al. “Training Compute-Optimal Large Language Models”, 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
Required Skills and Qualifications
• Pursuing a degree in NLP, Data Science, Computer Science, or a related field
• Experience with and strong interest in machine learning and NLP state-of-the-art models and concepts
An ideal candidate would have
• Experience with AWS and Cloud Computing
• Experience training and fine-tuning deep learning models at scale
• Experience in writing efficient PyTorch code
EQUAL OPPORTUNITIES STATEMENT
We are continuously striving to be an equal opportunity employer and we prohibit any discrimination based on sex, disability, origin, sexual orientation, gender identity, age, race, or religion. We believe that our diversity, breadth of experience, and multiple points of view are among the leading factors in our success.
CFM is a signatory of the Women Empowerment Principles.
FOLLOW US
Follow us on Twitter or LinkedIn or visit our website to find out more about CFM.