Skip to content

[native_doc_dartifier] Experiment usage of RAG in concising bindings context #2472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

marshelino-maged
Copy link
Contributor

@marshelino-maged marshelino-maged commented Aug 1, 2025

Retrieval-Augmented Generation (RAG) Experiment:
CI results from here: marshelino-maged#10

How does it work?

We have some documents that are a lot, in our case (summary of classes).

  • We calculate embedding for each of these classes using the gemini-embedding model
  • then store them in a vectorDB (chromaDB)

We have our question (query) in our case is the Java snippet that we want to translate

  • We calculate embedding for this query using the gemini-embedding model
  • We query the vectorDB with this, and ask it to return the most relevant K documents

Those K documents will be our bindings context summary to give to the LLM when translating
RAG-ezgif com-webp-to-jpg-converter

Experiment Results:

Generate JNI bindings for these classes, which are 286 classes, with a total of 35K tokens.

classes:
    - "java.io"
    - "com"

for this snippet

Boolean useEnums() {
    Example example = new Example();
    Boolean isTrueUsage = example.enumValueToString(Operation.ADD) == "Addition";
    return isTrueUsage;
}

Number of Tokens in the RAG Summary: 851 tokens

Top 10 classes retrieved

  Query Results:
  class Example extends jni$_.JObject 
  class Example$Operation extends jni$_.JObject 
  class $Example$Operation$Type extends jni$_.JObjType<Example$Operation> 
  class $Example$Operation$NullableType extends jni$_.JObjType<Example$Operation?> 
  class $Example$Type extends jni$_.JObjType<Example> 
  class $Example$NullableType extends jni$_.JObjType<Example?> 
  abstract class $ObjectInputValidation 
  class OptionalDataException extends ObjectStreamException 
  abstract class $FilenameFilter 
  abstract class $FileFilter 

for this snippet

public class ReadFile {
    public static void main(String[] args) {
        String filePath = "my-file.txt";
        try (
            FileReader fileReader = new FileReader(filePath);
            BufferedReader bufferedReader = new BufferedReader(fileReader)
        ) {
            String line = bufferedReader.readLine();
            System.out.println("The first line of the file is: " + line);
        } catch (IOException e) {
            System.err.println("An error occurred while reading the file: " + e.getMessage());
        }
    }
}

Number of Tokens in the RAG Summary: 2009 tokens

Top 10 classes retrieved

  Query Results:
  class FileReader extends InputStreamReader 
  class BufferedReader extends Reader 
  class FileInputStream extends InputStream 
  class RandomAccessFile extends jni$_.JObject 
  class LineNumberReader extends BufferedReader 
  class InputStreamReader extends Reader 
  class DataInputStream extends FilterInputStream 
  class StringReader extends Reader 
  class StringBufferInputStream extends InputStream 
  class ObjectInputStream extends InputStream 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:native_doc_dartifier type-infra A repository infrastructure change or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant