Indexing

Indexing involves breaking down a source code file into smaller chunks and converting these chunks into embeddings that can be stored in a vector database. Bito indexes your entire codebase locally (on your machine) to understand it and provide answers tailored to your code.

Learn more about Bito's AI that Understands Your Code feature.

How Bito Indexes Your Code

In the steps below, we'll show you how Bito indexes your code, ensuring that each query you have is met with precise and contextually relevant information. From breaking down code into digestible chunks to leveraging advanced AI models for nuanced understanding, Bito transforms the daunting task of code analysis into a seamless and efficient experience.

Here's how the magic happens:

Step 1: Chunk Breakdown

Dividing Code into Pieces

Bito starts by breaking down your source code files into smaller sections, known as 'chunks'. It’s like cutting up a long text into paragraphs to make it more manageable. Each chunk represents a piece of your code that can be individually indexed and analyzed.

Step 2: Indexing Each Chunk

Creating a Searchable Reference

After breaking down the file, each chunk is indexed, similar to creating a catalog entry. This step is crucial as it allows for the efficient location of the code segment later on.

Step 3: Generating Embeddings

Translating Code into Numeric Vectors

For every chunk, Bito generates a numeric vector or “embedding”. This process, which can be done using OpenAI or alternative open-source embedding models, translates the code into a mathematical representation. The idea is to create a form that can be easily compared and matched with other code chunks.

Step 4: Storing the Vectors

Saving the Essential Data

These embeddings are then stored in an index file on your machine. This index file is like a detailed directory, listing the file name, the location of the chunk within the file (start and end), and the embedding vector for each piece of code.

Step 5: Query Embedding

Understanding Your Questions

When you ask a question in Bito's chatbox, the AI checks whether it has some specific keywords like "my code", "my project", etc. If so, Bito generates a numeric vector for your query, mirroring the process used for code chunks.

The complete list of these keywords is given on our Available Keywords page.

Step 6: Finding the Nearest Neighbor

Matching Your Query with Code

Using the query's vector, Bito searches the index to find the code chunk with the closest matching embedding. This step identifies the relevant sections of your codebase that can answer your question.

Step 7: Contextualization

Building a Bigger Picture

Identifying chunks is just part of the process. Bito ensures that these chunks make sense in the broader context of your code. If necessary, it expands the search to include complete functions or related code segments, creating a fuller, more accurate context.

Step 8: Leveraging Language Models

Consulting the AI Experts

With the context in hand, Bito consults with language models – either basic (GPT-4o mini and similar models) or advanced (GPT-4o, Claude Sonnet 3.5, and best in class AI models) – to interpret the code within the context and provide an accurate response to your query.

Step 9: Session Privacy

Keeping Your Data Local

All the indexing and querying happens on your local machine. The index files are stored in the user’s home folder, for example on Windows the path will be something like C:\Users\Furqan\.bito\localcodesearch folder. It ensures that your code and session history remain private and secure.

Step 10: Safeguarding Data

Ensuring Confidentiality

Bito is committed to privacy. All LLM accounts it uses are under strict agreements to prevent your data from being used for training, recorded, or logged.

Step 11: Handling Hallucination

Reducing AI Fabrication

Bito is designed to minimize AI 'hallucinations' or fabrications, ensuring the answers you receive are based on your actual code. Although complete elimination of hallucination isn't feasible, as it sometimes aids in constructing beyond seen data, Bito strives to keep it in check, especially when dealing with your local code.

With these steps, Bito provides a robust and privacy-conscious method for indexing and understanding your code, simplifying navigation and enhancing productivity in your development projects.

PreviousVector databases NextGenerative AI

Last updated 10 months ago