Comment on page
In the steps below, we'll show you how Bito indexes your code, ensuring that each query you have is met with precise and contextually relevant information. From breaking down code into digestible chunks to leveraging advanced AI models for nuanced understanding, Bito transforms the daunting task of code analysis into a seamless and efficient experience.
Here's how the magic happens:
Dividing Code into Pieces
Bito starts by breaking down your source code files into smaller sections, known as 'chunks'. It’s like cutting up a long text into paragraphs to make it more manageable. Each chunk represents a piece of your code that can be individually indexed and analyzed.
Creating a Searchable Reference
After breaking down the file, each chunk is indexed, similar to creating a catalog entry. This step is crucial as it allows for the efficient location of the code segment later on.
Translating Code into Numeric Vectors
For every chunk, Bito generates a numeric vector or “embedding”. This process, which can be done using OpenAI or alternative open-source embedding models, translates the code into a mathematical representation. The idea is to create a form that can be easily compared and matched with other code chunks.
Saving the Essential Data
These embeddings are then stored in an index file on your machine. This index file is like a detailed directory, listing the file name, the location of the chunk within the file (start and end), and the embedding vector for each piece of code.
Understanding Your Questions
When you ask a question in Bito's chatbox, the AI checks whether it has some specific keywords like "my code", "my project", etc. If so, Bito generates a numeric vector for your query, mirroring the process used for code chunks.
Matching Your Query with Code
Using the query's vector, Bito searches the index to find the code chunk with the closest matching embedding. This step identifies the relevant sections of your codebase that can answer your question.
Building a Bigger Picture
Identifying chunks is just part of the process. Bito ensures that these chunks make sense in the broader context of your code. If necessary, it expands the search to include complete functions or related code segments, creating a fuller, more accurate context.
Consulting the AI Experts
With the context in hand, Bito consults with language models – either basic (GPT-3.5 Turbo, Claude Instant) or advanced (GPT-4, Claude 2) – to interpret the code within the context and provide an accurate response to your query.
Keeping Your Data Local
All the indexing and querying happens on your local machine. The index files are stored in the user’s home folder, for example on Windows the path will be something like C:\Users\Furqan\.bito\localcodesearch folder. It ensures that your code and session history remain private and secure.
Bito is committed to privacy. All LLM accounts it uses are under strict agreements to prevent your data from being used for training, recorded, or logged.
Reducing AI Fabrication
Bito is designed to minimize AI 'hallucinations' or fabrications, ensuring the answers you receive are based on your actual code. Although complete elimination of hallucination isn't feasible, as it sometimes aids in constructing beyond seen data, Bito strives to keep it in check, especially when dealing with your local code.
With these steps, Bito provides a robust and privacy-conscious method for indexing and understanding your code, simplifying navigation and enhancing productivity in your development projects.