Version control (also known as source control), is a practice (and a tool) for tracking and managing changes in software code. It enables multiple people to work collaboratively on a project, tracking modifications, and coordinating updates. Version control systems maintain a history of changes, allowing users to revert to previous versions, compare differences between versions, and merge changes made by different contributors. This ensures the integrity, traceability, and collaborative efficiency of software development and other collaborative projects.
Microsoft Fabric has built-in support for Azure DevOps Repos as version control system and in this blog post I’m going to go through, how it can be enabled for Fabric Notebooks and how it works.
Connect Workspace to an Azure DevOps Repo
The Azure DevOps Repos integration is done at the Workspace level. This means that every notebook in the workspace will be placed under version control when the connection is setup. To begin we need to navigate into workspace settings page. You can find this by clicking home and sorting your quick access items by type descending.
At the workspace settings page there is a Git integrations options under the System storage. I didn’t have this option in my personal workspace. I don’t know if it is related to that my workspace is quite old, but when I created new workspace I had this setting page available.
When setting up Azure DevOps connection you have to set the organization, project, repository and branch. You can only use Git type of version control. The Azure DevOps account must be registered to the same user that is using the Fabric workspace. So your Fabric account must have access into linked Azure DevOps repo.
There are some other limitations like maximum branch name is 244 characters, full path limit is 250 characters and maximum file size is 25 MB. You can read more about the limitations from this page.
I also noticed, that the Git folder (if used) must be present in repository. I was unable to create the connection if the Git folder was not already created into branch. So good idea is to for example create folder called “notebooks” into repository, add empty text file into folder and then create the connection at Fabric.
Usage
After the integration is ready press source control from top right and click sync branch button. Wait few seconds and refresh the workspace tab. If everything has gone well, you should be able to see Git status column with Synced status message.
In my example I had one notebook called Notebook 1 already in the workspace. I wanted to check that existing notebooks does not disappear when you enable the integration and that did not happen. My notebook was added into DevOps repo as I expected it to work.
Now the synchronization does not work automatically. If you change the notebook(s) you need to commit changes from source control page. If you have any pending changes, that has not been synced into DevOps you will see a red square with a number indicating how many changes are pending synchronization.
At the source control tab you can input commit message, select what notebook changes you want to add into commit and click commit from bottom. This will create a new commit into repository.
You can verify the commit from Azure DevOps repositories page. Click Repositories > Files from left menu and select correct repository from top. As shown in image below the synchronization will create new folder per notebook into DevOps, which contains item.config.json, item.metadata.json and notebook-content.py files.
You can also edit notebooks directly from Azure DevOps, or clone the repository into your own computer and use tool like Visual Studio Code to edit the notebooks. In this case you have to commit changes from your local computer back to Azure DevOps and then sync these changes into Microsoft Fabric. You can initialize this sync from same version source control page. You will receive a warning about pending updates, that indicates that you have changes at Azure DevOps that are not present in Microsoft Fabric. These changes can be pulled into Microsoft Fabric by clicking the update all button.
Summary
Version control proves to be a valuable ally for data scientists, offering practical advantages in managing code in Microsoft Fabric. With Git integration, collaboration becomes seamless, enabling multiple team members to work concurrently. The system keeps a detailed record of code changes, supporting experimentation with new algorithms or data processing techniques without compromising project integrity. For data scientists, version control serves as a reliable safeguard against errors and provides the flexibility to revert to previous states when needed.