The ABCs of LLM App Development: Embracing AI with GPT-4
A Step-by-Step Guide to Harnessing the Power of Large Language Models
Large language models like the one behind ChatGPT are all the rage, but getting started can be overwhelming. There’s a long list of new technologies and terminology to learn.
Even experienced developers may also struggle because LLMs represent a new way to build software; LLM apps are probabilistic rather than deterministic. However, for the most part, getting started with a LLM app is straightforward because they are akin to a traditional web app. Getting started is easier than you think.
LLM
OpenAI’s GPT3.5-turbo API, is widely available and easy to use. OpenAI has a handful of models, but 3.5-turbo is good enough for most use-cases and very affordable. One thing to consider: GPT-4 has a bigger context window (read more about context windows), so that may be immediately useful in certain situations. GPT-4 is also more capable but slower and more expensive (30x for completions!) There are many other options, including open source models, but this is the fastest way to get started, and reach your own “aha” moment.
Programming language
Learn basic Python. A lot of development elsewhere in the AI/ML stack is in Python, so while not immediately required to call an API, investing in the ecosystem will pay off. If you’re looking to get familiar or comfortable with Python, Automate the Boring Stuff with Python, is great resource, and it is free. To be clear, mastery of Python is not necessary, just basic competence.
Typescript is also popular, though its AI/ML ecosystem is not as strong. You’ll likely need a frontend as well (more below), for which Typescript is a great option.
Application
A backend API is likely required for any LLM app. Assuming you’re in the Python ecosystem, I recommend FastAPI because it is easy to get started with, and has a ton of community support. FastAPI also natively supports async code which can be very useful when calling LLM APIs since they can have very overall high latency (eg minutes).
You may eventually outgrow FastAPI because while it is powerful and configurable, it may lack certain functionality. Specifically, sometimes developers have to implement undifferentiated functionality in FastAPI like auth or user session management. While it has a learning curve, I recommend Django when there’s a lot more to do. Django has established patterns for undifferentiated web app features, and has enormous community support.
On the frontend, I recommend Next.js or vanilla React. They have the largest communities, the best tooling and easiest paths to shipping software.
LLM Orchestration
OpenAI’s APIs are great standalone. However, as an application becomes more complex, it may be necessary to introduce other constructs and glue code. This is where Langchain comes in. It’s a framework for LLM app development. What Ruby on Rails and Django were to web development in the 2000s, Langchain is positioning itself to be for LLM app development in the 2020s. Similar to the traditional web frameworks, there’s a learning curve, so I don’t recommend diving in head-first. Langchain is also a great way to stay abreast of the LLM ecosystem because a lot of new techniques, technologies and ideas land there quickly; there’s been a dizzying 100+ releases of Langchain in 2023 alone! An alternative I’m exploring is Haystack.
Vector database
There’s a lot of hype around vector databases, with companies raising tens of millions of dollars in the last couple of months.
A quick primer:
A vector database is a specialized database that stores vectors, which are mathematical representations of complex data like text, images or audio. Unlike traditional databases, which store text or numbers, vector databases allow efficient similarity searches. By finding vectors that are “close” to a given vector, it can quickly identify similar pieces of data. This makes vector databases particularly useful in fields like machine learning and artificial intelligence.
- ChatGPT
Colloquially, Pinecone, a leading vector DB provider refers to vector DBs as “long-term memory for AI”. This is a good rough mental model, though memory can be manifest in different forms too. Practically speaking, a vector DB is useful when an application needs to search across many pieces of texts to facilitate downstream LLM usage. For example, if an application needs to use a 100-page document, it may index chunks of that document in a vector DB, and then retrieve the most relevant portions when a user submits a question. This is effectively how many “chat with your docs” apps like PDF.ai work.
You don’t need a vector DB to get started with LLMs. However, if you find yourself needing one, I recommend using pgvector, a Postgres vector DB extension. pgvector is supported in AWS RDS, and Render, amongst other cloud providers, so this reduces complexity by reducing the total number of services backing an application. pgvector has the core features necessary for a vector DB, but you may eventually outgrow it. I intend to compare vector DBs, their features and cost-effectiveness in a future post.
Prompts
Crafting the right prompts is an iterative and creative process. I recommend the following resources to get more familiar with how prompting works, and what makes for good prompts:
ChatGPT Prompt Engineering for Developers This is a great video introduction from OpenAI and DeepLearning.AI. It’s concise yet immediately useful. If you consume one resource on prompt engineering, make it this one.
OpenAI cookbook: code examples of various techniques with practical OpenAI integrations.
Prompt-Engineering-Guide: an exhaustive reference for various techniques. Includes references to papers, and helpful examples.
There are many other great prompt resources, but these are good enough to get started with; they provide a breadth of coverage.
Tools
I covered these in my last post on context windows:
Prompts and token budgeting require some iteration. I found the following tools very helpful in refining my script:
OpenAI web tokenizer: this tool visualizes token usage for a piece of text. It’s essentially how the model “sees” and generates text.
tiktoken: This is a python package that tokenizes text. This is essentially the programmatic version of the web tokenizer. I use it for programmatic token budgeting, since it can operate on user-specified strings.
OpenAI Playground: this is a web console OpenAI provides to experiment with the API. I find it very useful for quickly prototyping a prompt, without needing to pipe everything together in code.
Deployment
When it comes to ship your LLM app, I recommend the following:
Render: for general backend hosting, including databases, caches, etc.
Vercel: for frontend hosting.
Clerk or Auth0 for auth. This will save you a ton of time implementing undifferentiated OAuth flows to enable users to log in with their existing accounts (eg Google, Apple, etc)
Closing
That’s it! While there’s a lot of learn about LLM development, I recommend simply getting started and using only the parts you need.