Installation Guide 🚀¶

This guide will walk you through the process of installing 🧪 MetaboT 🍵 and its dependencies.

Prerequisites 📋¶

Before installing 🧪 MetaboT 🍵, ensure you have the following installed:

pip (Python package installer) — Install pip
conda — Install Miniconda
Git — Install Git
LLM API Key — Get an API key for your chosen language model (OpenAI, DeepSeek, or Claude)
WSL (for Windows users) — Install WSL

Clone the Repository and switch to the `dev` branch:📥¶

git clone https://github.com/holobiomicslab/MetaboT.git
git checkout dev
cd MetaboT

Create and Activate the Conda Environment ⚙️¶

For macOS:

conda env create -f environment.yml
conda activate metabot

For Linux:

sudo apt-get update
sudo apt-get install -y python3-dev build-essential
conda env create -f environment.yml
conda activate metabot

For Windows (using WSL):

Install WSL if you haven't already:
```
wsl --install
```

Open WSL and install the required packages:

sudo apt-get update
sudo apt-get install -y python3-dev build-essential

Install Miniconda in WSL:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

Create and activate the conda environment:

conda env create -f environment.yml
conda activate metabot

Install Dependencies 📦¶

pip install -r requirements.txt

Environment Variables 🔑¶

Create a .env file in the root directory with the following variables:

# Optional: API Keys for external services
OPENAI_API_KEY=your_openai_api_key  # If using OpenAI service
DEEPSEEK_API_KEY=your_deepseek_api_key # If using DeepSeek API service
OVHCLOUD_API_KEY=your_ovhcloud_api_key # If using the OVHcloud services

Language Model Configuration 🤖¶

By default, all agents in MetaboT use OpenAI models, but you can configure different models for each agent. The current implementation supports: - OpenAI - DeepSeek - Claude (Anthropic) - Llama (via OVHcloud)

Adding New Models¶

To add a new model using LiteLLM:

Add a new section in app/config/params.ini:

[llm_litellm_your_model_name]
temperature=0.0
id=your-provider/model-name  # As specified in https://docs.litellm.ai/docs/providers
max_retries=3

Add your provider and API key mapping in app/core/main.py:

API_KEY_MAPPING = {
    "deepseek": "DEEPSEEK_API_KEY",
    "ovh": "OVHCLOUD_API_KEY",
    "openai": "OPENAI_API_KEY",
    "huggingface": "HUGGINGFACE_API_KEY",
    "anthropic": "ANTHROPIC_API_KEY",
    "gemini": "GEMINI_API_KEY",
    "your-provider": "YOUR_PROVIDER_API_KEY"  # Add your mapping here
}

Modify the provider detection in create_litellm_model function:

if model_id.startswith("deepseek"):
    provider = "deepseek"
elif model_id.startswith("gpt"):
    provider = "openai"
    model_name = f"openai/{model_id}"
elif model_id.startswith("your-prefix"):  # Add your model prefix detection
    provider = "your-provider"

The function automatically handles:

Provider detection based on model ID prefix
API key retrieval from environment variables
Basic parameters (temperature, max_retries)
Optional base URL configuration

Configuring Models for Different Agents¶

To use different models for different agents, modify app/config/langgraph.json. In the agents section, specify llm_choice with the name of your model section from params.ini:

{
  "agents": [
    {
      "name": "Entry_Agent",
      "path": "app.core.agents.entry.agent",
      "llm_choice": "llm_litellm_your_model_name"
    },
    {
      "name": "Validator",
      "path": "app.core.agents.validator.agent",
      "llm_choice": "llm_litellm_different_model"
    }
  ]
}

SPARQL Endpoint Configuration 🌐¶

Configure your SPARQL endpoint exclusively by setting the KG_ENDPOINT_URL variable in your .env file.

Verify Installation ✅¶

To verify the installation, execute the following command:

python app/tests/installation_test.py

This command initiates the agent workflow by constructing the RDF graph using the endpoint specified via the KG_ENDPOINT_URL variable in your .env file, instantiating the requisite language models, and executing one of the predefined standard queries. Successful execution confirms the proper configuration and integration of the system's core functionalities, including graph management and SPARQL query generation.

Common Issues 🐞¶

Issue: SPARQL Endpoint Connection¶

If SPARQL queries fail:

Check if the SPARQL endpoint is accessible.
Verify that the KG_ENDPOINT_URL variable in your .env file is correctly set.
Ensure proper network access/firewall settings.

Mass Spectrometry Data 🔬¶

By default, 🧪 MetaboT 🍵 connects to the public ENPKG endpoint which hosts an open, annotated mass spectrometry dataset derived from a chemodiverse collection of 1,600 plant extracts. This default dataset enables you to explore all features of 🧪 MetaboT 🍵 without the need for custom data conversion immediately. To use 🧪 MetaboT 🍵 on your mass spectrometry data, the processed and annotated results must first be converted into a knowledge graph format using the ENPKG tool. For more details on converting your own data, please refer to the Experimental Natural Products Knowledge Graph library and the associated publication.

Set your SPARQL endpoint by configuring the KG_ENDPOINT_URL variable in your .env file. If you are deploying a local endpoint that requires authentication, add the following variables to your .env file:

SPARQL_USERNAME=your_username
SPARQL_PASSWORD=your_password

Additionally, to ensure the SPARQL queries generated accurately reflect the schema of your knowledge graph, you must provide detailed information about your knowledge graph’s structure and update the prompt settings in:

app/core/agents/validator/prompt.py
The SPARQL generation chain in app/core/agents/sparql/tool_sparql.py

Support 🛠️¶

If you encounter any issues during installation:

Check our GitHub Issues for similar problems.
Create a new issue with detailed information about your setup and the error.

Next Steps

Follow the Quick Start Guide to begin using 🧪 MetaboT 🍵.
Review the Configuration Guide for detailed setup options.
Check out Example Usage for practical applications.