Quick Start Guide ๐¶
Welcome to the Quick Start Guide for ๐งช MetaboT ๐ต. This guide will help you quickly run and test the application.
๐ Try the MetaboT Web App Demo: https://metabot.holobiomicslab.cnrs.fr โ no installation needed!
The demo provides access to an open dataset of 1,600 plant extracts. You can explore metabolomics data and ask questions about the dataset directly through the web interface.
Prerequisites โ (For Local Installation)¶
Before you begin, ensure that you have:
-
Completed the Installation Guide
-
Set the necessary environment variables in your
.env
file:-
API key for your chosen language model:
OPENAI_API_KEY
if using OpenAIDEEPSEEK_API_KEY
if using DeepSeekCLAUDE_API_KEY
if using Claude
-
SPARQL endpoint configuration:
KG_ENDPOINT_URL
(required)SPARQL_USERNAME
andSPARQL_PASSWORD
(if your endpoint requires authentication)
-
-
Activated your Python virtual environment
Running a Standard Query ๐¶
๐งช MetaboT ๐ต includes several predefined queries that demonstrate its capabilities. Those questions could be found here. For example, to run the first standard query (which counts features with matching SIRIUS/CSI:FingerID and ISDB annotations), execute:
What this does:
- Loads the default configuration and dataset
- Executes the first query from a list of standard queries
- Returns insights based on RDF graph analysis
Running a Custom Query ๐ ๏ธ¶
You can also run custom queries tailored to your research needs. For example, to query SIRIUS structural annotations for a specific plant:
python -m app.core.main -c "What are the SIRIUS structural annotations for Tabernaemontana coffeoides?"
- Replace the query text in quotes with your desired question.
- Ensure that the query is relevant to the metabolomics data available in your configuration.
Running in Docker ๐ณ¶
If you prefer to run ๐งช MetaboT within a Docker container, follow these steps:
-
Build the Docker Image: Ensure Docker and docker-compose are installed, then run:
-
Run the Application in Docker: To execute the first standard query, run:
This command starts the container and runs the application accordingly. You can adjust the command as needed.
Workflow Overview ๐¶
๐งช MetaboT ๐ต leverages a multi-agent workflow architecture to process queries efficiently:
- Entry Agent: Processes the incoming query and routes it to the appropriate system.
- Validator Agent: Immediately verifies that the incoming query is pertinent to the knowledge graph, ensuring its alignment with domain-specific schema.
- Supervisor Agent: Oversees and coordinates all processing steps within the workflow.
- ENPKG Agent: Handles domain-specific data processing related to metabolomics.
- SPARQL Agent: Generates and executes queries against the RDF knowledge graph.
- Interpreter Agent: Interprets and formats the query results for user readability.
This modular design allows ๐งช MetaboT ๐ต to be extended and customized for various research scenarios.
Example Scenarios ๐¶
Basic Feature Analysis¶
Run a standard query to count LCMS features detected in negative ionization mode:
Chemical Structure Analysis¶
Obtain structural annotations for a plant sample:
python -m app.core.main -c "What are the SIRIUS structural annotations for Tabernaemontana coffeoides?"
Bioassay Results Exploration¶
Examine bioassay data for compounds in a specified extract:
python -m app.core.main -c "List the bioassay results at 10ยตg/mL against T.cruzi for lab extracts of Tabernaemontana coffeoides"
Interacting with the Knowledge Graph ๐¶
๐งช MetaboT ๐ต connects to a knowledge graph to enrich analysis:
import os
from app.core.graph_management.RdfGraphCustom import RdfGraph
# Connect to the knowledge graph using the defined endpoint
# If SPARQL_USERNAME and SPARQL_PASSWORD environment variables are set,
# they will be automatically used for authentication
graph = RdfGraph(
query_endpoint="https://enpkg.commons-lab.org/graphdb/repositories/ENPKG",
standard="rdf",
auth=None # Will automatically use environment variables if available
)
# Or explicitly provide authentication:
auth = (os.getenv("SPARQL_USERNAME"), os.getenv("SPARQL_PASSWORD"))
graph = RdfGraph(
query_endpoint="your_endpoint_url",
standard="rdf",
auth=auth
)
KG_ENDPOINT_URL
environment variable is correctly set to point to your graph database.
Advanced Configuration โ๏ธ¶
LangSmith Integration¶
For enhanced tracking and monitoring of workflow runs, we are using LangSmith. An API key is needed (free upon registration) and set the .env variable as follow:
- Set up LangSmith:
- Review LangSmith logs to access runtime details for debugging and auditing.
Custom Model Settings¶
Review and adjust the language model configurations in app/config/params.ini
:
Troubleshooting ๐¶
If you encounter issues, consider the following steps:
- Environment Variables: Verify that your chosen LLM API key (
OPENAI_API_KEY
,DEEPSEEK_API_KEY
,CLAUDE_API_KEY
, etc.) andKG_ENDPOINT_URL
are correctly set in your.env
file. - Knowledge Graph Access: Confirm that the knowledge graph endpoint is reachable and correctly configured.
- Logs: Review terminal output for any error messages or warnings during execution.
Next Steps โก๏ธ
- Explore the User Guide for in-depth explanations of ๐งช MetaboT ๐ต's components.
- Review the API Reference to understand function details.
- Examine the Examples for more advanced usage scenarios.