Skip to content

End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.

License

Notifications You must be signed in to change notification settings

DataWithBaraa/databricks_bootcamp_2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Databricks Bootcamp 2026

Welcome to the Databricks Data Lakehouse Project by Data With Baraa.

This repository contains a complete, real-world Data Lakehouse implementation built on Databricks, including datasets, notebooks, SQL examples, and exercises. Everything here is designed to help you understand how modern data teams use Databricks in practice, from data ingestion and transformation to analytics-ready data products.


⚠️ Important Note

Build this project on your own first using the Notion roadmap.
Use this repository only as a reference if you get stuck.

Before starting, watch the Databricks Bootcamp, where I explain the architecture and decisions behind this project.


🏗️ Architecture

This project follows the Medallion Architecture:

🥉 Bronze Layer

  • Raw data ingestion
  • Schema inference and storage as Delta tables

🥈 Silver Layer

  • Data cleaning and standardization
  • Type casting and validation

🥇 Gold Layer

  • Dimensional Data Model (Business Transformation)
  • Ready for BI and analysis

🛠️ Technologies Used

  • Databricks
  • Apache Spark
  • PySpark
  • Spark SQL
  • Delta Lake
  • Unity Catalog

Prerequisites

  • Basic SQL, Python and some Pyspark knowledge
  • No prior Databricks experience required

☕ Stay Connected

🌍 Connect With Me

YouTube LinkedIn Website Newsletter


🎓 Courses (Structured & Certified)


▶️ Free YouTube Courses


🛡️ License

This project is licensed under the MIT License. You are free to use, modify, and share this project with proper attribution.

🌟 About Me

Hi, I’m Baraa Khatib Salkini, also known as Data With Baraa. I’m a senior data professional and educator with over 17 years of industry experience, working across data engineering, analytics, and modern data platforms. I’ve led large-scale data projects in real companies and now focus on teaching practical, real-world data skills through my courses, YouTube content, and bootcamps. My goal is simple: help you understand how data actually works in real systems, not just how to write code.

About

End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL, Delta Lake, and Unity Catalog. Designed for learning, portfolio building, and job interviews.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published