Notice

Open Assistant has now concluded. Please see this video for more information. Thanks you to all those who made this project possible.

Introduction

The FAQ page is available at here.

Open Assistant (abbreviated as OA) is a chat-based and open-source assistant. The vision of the project is to make a large language model that can run on a single high-end consumer GPU. With some modifications, Open Assistant should also be able to interface with other third-party applications easily as well as retrieve information from databases and the Internet.

You can play with our current best model here!

You should join the Open Assistant discord server and/or comment on Github issues before making any major changes. Most dev communications take place on the Discord server. There are four main areas that you can work on:

Ranking, labelling and making responses in open-assistant.io. You can take a look at tasks docs section for more information.
Curating datasets and performing data augmentation. This includes scraping, gathering other public datasets, etc. Most of these efforts will be concentrated at /data/datasets and are documented at here.
Creating and fine-tuning Open Assistant itself. For that, you should pay special attention to /model.
open-assistant.io dev. Take a close look at /website as well as /backend.

GitHub folders explanation

Do read the developer guide for further information.

Here's a list of first-level folders at Open Assistant's Github page.

/ansible - for managing the full stack using Ansible
/assets - contains logos
/backend - backend for open-assistant.io and discord bots, maybe helpful for locally test API calls
/copilot - read more at AWS's Copilot. And no, this is not a folder that contains something similar to OpenAI's Codex.
/data - contains /data/datasets that contains data scraping code and links to datasets on Hugging Face
/deploy
/discord-bot - frontend as discord bots for volunteer data collection
/docker
/docs - this website!
/inference - inference pipeline for Open Assistant model
/model - currently contains scripts and tools for training/fine-tuning Open Assistant and other neural networks
*/notebooks - DEPRECATED in favor of*/data/datasets. Contains jupyter notebooks for data scraping and augmentation
/oasst-shared - shared Python code for Open Assistant
/scripts - contains various scripts for things
/text-frontend
/website - everything in open-assistant.io, including gamification

Principles

We put the human in the center
We need to get the MVP out fast, while we still have momentum
We pull in one direction
We are pragmatic
We aim for models that can (or could, with some effort) be run on consumer hardware
We rapidly validate our ML experiments on a small scale, before going to a supercluster

Notice

Introduction

GitHub folders explanation​

Principles​

GitHub folders explanation

Principles