Notice
Open Assistant has now concluded. Please see this video for more information. Thanks you to all those who made this project possible.
Introduction
The FAQ page is available at here.
Open Assistant (abbreviated as OA) is a chat-based and open-source assistant. The vision of the project is to make a large language model that can run on a single high-end consumer GPU. With some modifications, Open Assistant should also be able to interface with other third-party applications easily as well as retrieve information from databases and the Internet.
You can play with our current best model here!
You should join the Open Assistant discord server and/or comment on Github issues before making any major changes. Most dev communications take place on the Discord server. There are four main areas that you can work on:
- Ranking, labelling and making responses in open-assistant.io. You can take a look at tasks docs section for more information.
- Curating datasets and performing data augmentation. This includes scraping,
gathering other public datasets, etc. Most of these efforts will be
concentrated at
/data/datasets
and are documented at here. - Creating and fine-tuning Open Assistant itself. For that, you should pay
special attention to
/model
. - open-assistant.io dev. Take a close look at
/website
as well as/backend
.
GitHub folders explanation
Do read the developer guide for further information.
Here's a list of first-level folders at Open Assistant's Github page.
/ansible
- for managing the full stack using Ansible/assets
- contains logos/backend
- backend for open-assistant.io and discord bots, maybe helpful for locally test API calls/copilot
- read more at AWS's Copilot. And no, this is not a folder that contains something similar to OpenAI's Codex./data
- contains/data/datasets
that contains data scraping code and links to datasets on Hugging Face/deploy
/discord-bot
- frontend as discord bots for volunteer data collection/docker
/docs
- this website!/inference
- inference pipeline for Open Assistant model/model
- currently contains scripts and tools for training/fine-tuning Open Assistant and other neural networks- *
/notebooks
- DEPRECATED in favor of*/data/datasets
. Contains jupyter notebooks for data scraping and augmentation /oasst-shared
- shared Python code for Open Assistant/scripts
- contains various scripts for things/text-frontend
/website
- everything in open-assistant.io, including gamification
Principles
- We put the human in the center
- We need to get the MVP out fast, while we still have momentum
- We pull in one direction
- We are pragmatic
- We aim for models that can (or could, with some effort) be run on consumer hardware
- We rapidly validate our ML experiments on a small scale, before going to a supercluster