Advanced Bruin Workflows: AI Agents & Cloud Deployment
Goal: Explore advanced Bruin capabilities by integrating AI agents via the Model Context Protocol (MCP) to automate pipeline creation, and deploying our workflows to Bruin Cloud for fully managed orchestration.
1. Supercharging Development with Bruin MCP
One of the most powerful features of the modern data stack is the integration of Artificial Intelligence. Bruin supports the Model Context Protocol (MCP), allowing AI agents (like Cursor, GitHub Copilot, or Claude) to interact directly with your Bruin project.
What can an AI Agent do with Bruin MCP?
When you provide the Bruin MCP server to your IDE, the AI agent becomes a data engineering assistant that can:
- Read your database schema and understand your table structures.
- Write pipeline code, SQL transformations, and asset configurations.
- Run validation checks (
bruin validate). - Troubleshoot errors and debug issues.
- Execute queries and analyze data using natural language.
How to use it
In an editor like Cursor, you simply go to Settings → Tools & MCP → New MCP Server and add the command bruin mcp.
Once enabled, you can provide a single prompt to the agent (e.g., “Build a 3-layer pipeline for NYC Taxi data using DuckDB”), and the agent will automatically generate the ingestion Python scripts, the Staging SQL logic, configure the dependencies, and set up Data Quality checks.
Conversational Data Analysis Beyond building pipelines, MCP allows you to “talk” to your data. After a pipeline runs, you can ask the agent:
“Query the staging table and tell me which day had the highest number of trips and total fare.”
The agent will write the SQL, execute it via Bruin, and return the answer in plain text.
2. Deploying to Bruin Cloud
Running pipelines locally with DuckDB is great for development, but production requires robust, managed infrastructure. Bruin Cloud provides a fully managed environment for your data pipelines.
Connecting Your Infrastructure
Instead of manually configuring Airflow servers or CI/CD runners, Bruin Cloud connects directly to your existing tools:
- Version Control: Connect your GitHub repository directly to Bruin Cloud. It will automatically detect your Bruin projects and parse the pipeline definitions.
- Connections: Set up your data warehouse connections (e.g., BigQuery, Snowflake, Redshift) securely in the cloud UI. The connection names in the cloud will map directly to the connection names in your local
.bruin.ymlfile.
Deployment & Execution
Deploying a pipeline is as simple as clicking “Enable” in the Bruin Cloud UI.
- If your pipeline has a schedule (e.g.,
@daily), Bruin Cloud will automatically trigger the first run for the previous interval. - It securely executes the Python and SQL assets precisely as they were defined in your repository.
Monitoring & Governance
Once deployed, Bruin Cloud provides a single pane of glass for monitoring:
- Verify the status of every asset (Success/Failure).
- Review Data Quality check results (e.g., catching nulls or duplicates).
- Visualize the lineage across all assets to understand upstream and downstream impacts.
- Leverage built-in AI features to analyze data or ask questions about your pipeline’s health.
This concludes our 3-part series on Modern Data Platforms using Bruin. From understanding core concepts to building a local pipeline, and finally automating and deploying with AI and the Cloud, Bruin demonstrates how unified data platforms are simplifying the Data Engineering landscape.