Merge pull request #15 from NIGMS/llm-integration

kyleoconnell-NIH · web-flow · commit 5e8c6e57685e · 2025-05-22T09:21:35.000-04:00
Added util folder with gemini.py, added Gemini to RNA-seq module, and…
diff --git a/GoogleCloud/01-RNA-Seq/RNA-seq.ipynb b/GoogleCloud/01-RNA-Seq/RNA-seq.ipynb
@@ -141,7 +141,12 @@
     "\n",
     "* **Data visualization and interpretation:** The notebook emphasizes creating and interpreting various visualizations, including boxplots, mean-standard deviation plots, PCA plots, histograms of p-values, MA plots, and heatmaps.\n",
     "\n",
-    "* **Using bioinformatics tools:** The notebook shows how to use Nextflow (a workflow management system) for RNA-seq data processing and R for statistical analysis and visualization."
+    "* **Using bioinformatics tools:** The notebook shows how to use Nextflow (a workflow management system) for RNA-seq data processing and R for statistical analysis and visualization.\n",
+    "\n",
+    "<div class=\"alert alert-block alert-success\">\n",
+    "    <i class=\"fa fa-hand-paper-o\" aria-hidden=\"true\"></i>\n",
+    "    <b>Tip: </b>  If you're having trouble with any part of this tutorial, feel free to leverage Gemini (Google's advanced generative AI model) at the bottom of this module.\n",
+    "</div>  "
    ]
   },
   {
@@ -755,6 +760,41 @@
     "<hr style=\"border:2px solid Orange\">"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "4c0cd032-0b9e-4dfe-9708-a3a5e520cf31",
+   "metadata": {},
+   "source": [
+    "## Gemini (Optional)\n",
+    "--------\n",
+    "\n",
+    "If you're having trouble with this submodule (or others within this tutorial), feel free to leverage Gemini by running the cell below. Gemini is Google's advanced generative AI model designed to enhance the capabilities of AI applications across various domains."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4b345c9c-13b0-4486-81ee-f12399714550",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Ensure you have the necessary libraries installed\n",
+    "!pip install -q google-generativeai google-cloud-secret-manager\n",
+    "!pip install -q git+https://github.com/NIGMS/NIGMS-Sandbox-Repository-Template.git#subdirectory=llm_integrations\n",
+    "!pip install -q ipywidgets\n",
+    "\n",
+    "import sys\n",
+    "import os\n",
+    "util_path = os.path.join(os.getcwd(), 'util')\n",
+    "if util_path not in sys.path:\n",
+    "    sys.path.append(util_path)\n",
+    "\n",
+    "from gemini import run_gemini_widget, create_gemini_chat_widget \n",
+    "from IPython.display import display\n",
+    "\n",
+    "run_gemini_widget()"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "919d00c4-2247-4f94-aa4f-b7cf6c0334d3",
@@ -777,9 +817,9 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "conda_python3",
+   "display_name": "Python 3 (ipykernel) (Local)",
    "language": "python",
-   "name": "conda_python3"
+   "name": "conda-base-py"
   },
   "language_info": {
    "codemirror_mode": {
diff --git a/GoogleCloud/01-RNA-Seq/util/gemini.py b/GoogleCloud/01-RNA-Seq/util/gemini.py
@@ -0,0 +1,94 @@
+import gemini_helper.build as builder
+import google.generativeai as genai
+import ipywidgets as widgets
+
+_model = None # Private module-level variable for the model
+
+def _initialize_model():
+    """Initializes the Gemini model if not already initialized."""
+    global _model
+    if _model is None:
+        try:
+            key = builder.get_api_key()
+            genai.configure(api_key=key)
+            _model = genai.GenerativeModel("gemini-1.5-flash")
+            print("Gemini model initialized successfully.")
+        except Exception as e:
+            print(f"Error initializing Gemini model: {e}")
+            _model = None # Ensure it's None if initialization fails
+    return _model
+
+# --- Widget Creation and Logic ---
+def create_gemini_chat_widget():
+    """
+    Creates and returns the Ipywidgets for a Gemini chat interface.
+    The widgets should be displayed in the calling environment (e.g., Jupyter Notebook).
+    """
+    model = _initialize_model()
+    if not model:
+        # If model failed to initialize, return a message widget
+        error_message = widgets.HTML("Failed to initialize Gemini model. Please check API key and configuration.")
+        return (error_message,) # Return as a tuple for consistency with display
+
+    # Create a text box for input
+    prompt_input = widgets.Text(
+        value='',
+        placeholder='Type your prompt here',
+        description='Prompt:',
+        disabled=False
+    )
+
+    # Create an output area for the response
+    response_output = widgets.Output()
+
+    # Create a button to submit the prompt
+    submit_button = widgets.Button(description="Submit")
+
+    # Define the function to handle the button click
+    def on_submit_button_clicked(b):
+        # Ensure the model is available (it should be if we reached here)
+        if not _model:
+            with response_output:
+                response_output.clear_output()
+                print("Model is not available.")
+            return
+
+        with response_output:
+            response_output.clear_output() # Clear previous output
+            prompt_text = prompt_input.value
+            if not prompt_text.strip():
+                print("Please enter a prompt.")
+                return
+
+            # Indicate processing
+            print("Generating response...")
+            try:
+                # Generate the response using the Gemini model
+                response = _model.generate_content(prompt_text)
+                response_output.clear_output() # Clear "Generating response..."
+                print(response.text)
+            except Exception as e:
+                response_output.clear_output() # Clear "Generating response..."
+                print(f"Error generating content: {e}")
+
+    # Attach the click event handler to the button
+    submit_button.on_click(on_submit_button_clicked)
+
+    # Return the widgets to be displayed by the caller
+    return prompt_input, submit_button, response_output
+
+def run_gemini_widget():
+    """
+    An example "empty" function to create and immediately display the widget.
+    """
+    from IPython.display import display # Import display here, as it's for notebook environment
+
+    widgets_to_display = create_gemini_chat_widget()
+    if widgets_to_display: # Check if widgets were created
+        # If only one widget is returned (e.g., an error message), display it
+        if isinstance(widgets_to_display, tuple) and len(widgets_to_display) > 1:
+            display(*widgets_to_display)
+        else:
+            display(widgets_to_display)
+    else:
+        print("Could not create Gemini widget.")
diff --git a/GoogleCloud/README.md b/GoogleCloud/README.md
@@ -37,4 +37,23 @@ Now, you can explore the tutorial and run code blocks from each Jupyter notebook
 
 As seen in the above figure, we have downloaded the data from the NCBI GEO website with accession number GSE173380. Sample data is already provided in the `gs://nigms-sandbox/nosi-und` Google Cloud bucket. There is no need to download the data again unless you want to run the optional Nextflow preprocessing step on the entire dataset (which could be computationally expensive). In the second step of submodule 1 and 2, Nextflow, in collaboration with Google Batch API and Vertex AI, is used to perform the preprocessing. Nextflow works as a workflow manager, which enables scalable and reproducible scientific workflows using containers. Google Life Sciences API is a suite of tools and services for managing, processing, and transforming life science data where it creates and manages clusters and virtual machines. It helps to split the job into multiple jobs and assigns each job to a set of designated virtual machines. Vertex AI, on the other hand, behaves like an interface to manage and execute the process. 
 
-After initial preprocessing using Nextflow, further preprocessing, normalization, clustering analysis, differential analysis, and visualization is done in Vertex AI's Jupyter notebook using the R kernel. The results are written in the current working directory inside the Vertex AI instance and transferred to cloud buckets for storage. In the fourth step, we will extract the data from step two and three to use for the multi-omics module's integration analysis. The integrative analysis is also performed using Vertex AI's Jupyter notebook using the R kernel. We will use multi-omics integrative techniques like correlation tests, overlaps and enrichment colocalization, functional and pathway relation, and motifs search. The results from these techniques will be explored in the notebook and then transferred to cloud storage for future reference.   
+After initial preprocessing using Nextflow, further preprocessing, normalization, clustering analysis, differential analysis, and visualization is done in Vertex AI's Jupyter notebook using the R kernel. The results are written in the current working directory inside the Vertex AI instance and transferred to cloud buckets for storage. In the fourth step, we will extract the data from step two and three to use for the multi-omics module's integration analysis. The integrative analysis is also performed using Vertex AI's Jupyter notebook using the R kernel. We will use multi-omics integrative techniques like correlation tests, overlaps and enrichment colocalization, functional and pathway relation, and motifs search. The results from these techniques will be explored in the notebook and then transferred to cloud storage for future reference.   
+
+## Gemini (Optional)
+
+Generative AI is available for this tutorial in the form of Gemini if you would like to use it. To run it, please reference Submodule 1-RNA-Seq, or run the following code within a submodule notebook. You will need to save the util folder with the gemini.py file in the same directory as the notebook where you are running Gemini.
+
+```!pip install -q google-generativeai google-cloud-secret-manager
+!pip install -q git+https://github.com/NIGMS/NIGMS-Sandbox-Repository-Template.git#subdirectory=llm_integrations
+!pip install -q ipywidgets
+
+import sys
+import os
+util_path = os.path.join(os.getcwd(), 'util')
+if util_path not in sys.path:
+    sys.path.append(util_path)
+
+from gemini import run_gemini_widget, create_gemini_chat_widget 
+from IPython.display import display
+
+run_gemini_widget()