Welcome to the Databricks Data Lakehouse Project by Data With Baraa.
This repository contains a complete, real-world Data Lakehouse implementation built on Databricks, including datasets, notebooks, SQL examples, and exercises. Everything here is designed to help you understand how modern data teams use Databricks in practice, from data ingestion and transformation to analytics-ready data products.
Build this project on your own first using the Notion roadmap.
Use this repository only as a reference if you get stuck.
Before starting, watch the Databricks Bootcamp, where I explain the architecture and decisions behind this project.
- 🧭 Notion Roadmap: Open guide
▶️ Databricks Bootcamp: Watch on YouTube- 🎉 Finished? Share it on LinkedIn. Let’s celebrate
This project follows the Medallion Architecture:
- Raw data ingestion
- Schema inference and storage as Delta tables
- Data cleaning and standardization
- Type casting and validation
- Dimensional Data Model (Business Transformation)
- Ready for BI and analysis
- Databricks
- Apache Spark
- PySpark
- Spark SQL
- Delta Lake
- Unity Catalog
- Basic SQL, Python and some Pyspark knowledge
- No prior Databricks experience required
- 🏅 SQL Full Course → Start here
- 🏅 Tableau Full Course → Start here
- SQL Full Course → Watch on YouTube
- Python Full Course → Watch on YouTube
- Tableau Full Course → Watch on YouTube
- Real-World Data Projects → Watch on YouTube
- Data Career Roadmaps → Watch on YouTube
This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.
Hi, I’m Baraa Khatib Salkini, also known as Data With Baraa. I’m a senior data professional and educator with over 17 years of industry experience, working across data engineering, analytics, and modern data platforms. I’ve led large-scale data projects in real companies and now focus on teaching practical, real-world data skills through my courses, YouTube content, and bootcamps. My goal is simple: help you understand how data actually works in real systems, not just how to write code.