clumsydonkey@lemmy.world

clumsydonkey@lemmy.world

Hi fellow data engineers,

Currently I’m restructuring a pipeline written with pyspark on Databricks. Since it’s a lot of transformations, results in an extensive DAG, but it’s cool to spend some extra processing resources to make a standard dimensional model (apart from the necessary transformations).

Was wondering what real benefits you have seen a star schema design has from the “one big table” approach, I could preach to my team? (My goal mainly would be to have a resulting smaller PowerBI model.)

And as a side question, what tools do you use to create a dimensional model such a star schema with code?

Thanks a lot!

One big table vs a dimensional model

One big table vs a dimensional model