Search AI/ML

Found 3 bookmarks

Custom sorting

RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models — Together AI

RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models — Together AI

Releasing a new version of the RedPajama dataset, with 30 trillion filtered and deduplicated tokens (100+ trillions raw) from 84 CommonCrawl dumps covering 5 languages, along with 40+ pre-computed data quality annotations that can be used for further filtering and weighting.

#model training #training

·together.ai·Nov 1, 2023

RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training Large Language Models — Together AI

How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs | promptfoo

How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs | promptfoo

#prompt #training #model training

·promptfoo.dev·Sep 10, 2023

How to benchmark Llama2 Uncensored vs. GPT-3.5 on your own inputs | promptfoo

Simplifying Transformers: State of the Art NLP Using Words You Understand- Part 1 — Intro

Simplifying Transformers: State of the Art NLP Using Words You Understand- Part 1 — Intro

This is the first in a series talking about Transformers. As of now, there are quite a few good resources on Transformers, so why make…

#learning #training

·towardsdatascience.com·Aug 17, 2023

Simplifying Transformers: State of the Art NLP Using Words You Understand- Part 1 — Intro