Indexing
Indexing involves breaking down a source code file into smaller chunks and converting these chunks into embeddings that can be stored in a vector database. Bito indexes your entire codebase locally (on your machine) to understand it and provide answers tailored to your code.
Learn more about Bito's AI that Understands Your Code feature.
How Bito Indexes Your Code
In the steps below, we'll show you how Bito indexes your code, ensuring that each query you have is met with precise and contextually relevant information. From breaking down code into digestible chunks to leveraging advanced AI models for nuanced understanding, Bito transforms the daunting task of code analysis into a seamless and efficient experience.
Here's how the magic happens:
Step 1: Chunk Breakdown
Dividing Code into Pieces
Bito starts by breaking down your source code files into smaller sections, known as 'chunks'. It’s like cutting up a long text into paragraphs to make it more manageable. Each chunk represents a piece of your code that can be individually indexed and analyzed.
Step 2: Indexing Each Chunk
Creating a Searchable Reference
After breaking down the file, each chunk is indexed, similar to creating a catalog entry. This step is crucial as it allows for the efficient location of the code segment later on.
Step 3: Generating Embeddings
Translating Code into Numeric Vectors
For every chunk, Bito generates a numeric vector or “embedding”. This process, which can be done using OpenAI or alternative open-source embedding models, translates the code into a mathematical representation. The idea is to create a form that can be easily compared and matched with other code chunks.
Step 4: Storing the Vectors
Saving the Essential Data
These embeddings are then stored in an index file on your machine. This index file is like a detailed directory, listing the file name, the location of the chunk within the file (start and end), and the embedding vector for each piece of code.
Step 5: Query Embedding
Understanding Your Questions
When you ask a question in Bito's chatbox, the AI checks whether it has some specific keywords like "my code", "my project", etc. If so, Bito generates a numeric vector for your query, mirroring the process used for code chunks.
The complete list of these keywords is given on our Available Keywords page.
Step 6: Finding the Nearest Neighbor
Matching Your Query with Code
Using the query's vector, Bito searches the index to find the code chunk with the closest matching embedding. This step identifies the relevant sections of your codebase that can answer your question.
Step 7: Contextualization
Building a Bigger Picture
Identifying chunks is just part of the process. Bito ensures that these chunks make sense in the broader context of your code. If necessary, it expands the search to include complete functions or related code segments, creating a fuller, more accurate context.
Step 8: Leveraging Language Models
Consulting the AI Experts
With the context in hand, Bito consults with language models – either basic (GPT-4o mini and similar models) or advanced (GPT-4o, Claude Sonnet 3.5, and best in class AI models) – to interpret the code within the context and provide an accurate response to your query.
Step 9: Session Privacy
Keeping Your Data Local
All the indexing and querying happens on your local machine. The index files are stored in the user’s home folder, for example on Windows the path will be something like C:\Users\Furqan\.bito\localcodesearch folder. It ensures that your code and session history remain private and secure.
Step 10: Safeguarding Data
Ensuring Confidentiality
Bito is committed to privacy. All LLM accounts it uses are under strict agreements to prevent your data from being used for training, recorded, or logged.
Step 11: Handling Hallucination
Reducing AI Fabrication
Bito is designed to minimize AI 'hallucinations' or fabrications, ensuring the answers you receive are based on your actual code. Although complete elimination of hallucination isn't feasible, as it sometimes aids in constructing beyond seen data, Bito strives to keep it in check, especially when dealing with your local code.
With these steps, Bito provides a robust and privacy-conscious method for indexing and understanding your code, simplifying navigation and enhancing productivity in your development projects.
Last updated