Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens
Together, the developer, claims it is the largest public dataset specifically for language model pre-training
Data management recent news
GPT-4 – Dr Alan D. Thompson – Life Architect
RLHF: Reinforcement Learning from Human Feedback
Integrated AI: The sky is comforting (2023 AI retrospective) – Dr Alan D. Thompson – Life Architect
togethercomputer/RedPajama-Data-V2 · Open source community will forever be indebted to Together AI.
Top 10 List of Large Language Models in Open-Source
RedPajama Project: An Open-Source Initiative to Democratizing LLMs - KDnuggets
Benjamin Rogers on LinkedIn: RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training…
RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models
Together AI Releases RedPajama v2: An Open Dataset with 30 Trillion Tokens for Training Large Language Models - MarkTechPost
ChatGPT / Generative AI recent news, page 3 of 19
Data science recent news
RLHF: Reinforcement Learning from Human Feedback
togethercomputer/RedPajama-Data-1T-Sample · Datasets at Hugging Face
Product & Engineering Archives - Pear VC