top of page

Integration Strategies with Azure Synapse Link

Can you imagine being able to access your company's data in real time, without affecting the performance of your system and with a scalable and flexible infrastructure? This is what Azure Synapse Link offers you, a tool that allows you to export Finance and Operations and Dataverse data to an external data warehouse.

In the current landscape, it is increasingly common to find discussions about the transformation or migration of traditional integration methods between Finance and Operations and Datalake towards a renewed infrastructure. One of the most significant changes in this process lies in the transfer of the administration of this functionality from the Finance environment to the Power Platform portal.

This paradigm shift is materialized in Azure Synapse Link, a robust tool that allows you to export data from both Finance and Dataverse to an external storage account. From this point, a range of possibilities opens up for each environment, who will be able to use this data to generate reports or integrations with third-party applications or Microsoft's own solutions.


After some time observing all these possibilities, perhaps the biggest question that has come to mind is identifying and choosing which is the best way to do it and what are the implications of each of them.


That is why my intention with this article is to centralize this information, see the examples of each implementation typology and how this data is subsequently consumed. We will also see how it influences the cost of resources in Azure.


Let's start at the beginning, a little index about the article:



What is Azure Synapse Link?


Azure Synapse Link is a functionality that is managed from Power Platform that allows us to export data from 2 different data sources to Azure Storage:

  • Dataverse

  • Finance and Operations

With this functionality we can consume this data without pulling the transactional and set up a scalable and common DataWarehouse infrastructure for several data sources.


Previous configuration in Finance and Operations


Regarding Finance and Operations, a series of requirements must be met for the integration to occur:


  1. Environment version

  • 10.0.36 (PU60) cumulative update 7.0.7036.133 or later.

  • 10.0.37 (PU61) cumulative update 7.0.7068.109 or later.

  • 10.0.38 (PU62) cumulative update 7.0.7120.59 or later



Installation typologies


Previous question


Once we have almost all the elements we need, it is necessary to stop to think about what type of connection we want. With this we can create the Azure resources that are required.


The key question to ask is: how do we fully consume the data?


We have 3 possible answers:


1. Access to Finance and Operations tables through queries in Synapse Analytics

2. Load incremental data changes to your own data warehouse

3. Access Finance and Operations tables via Microsoft Fabric


This question is key when determining the infrastructure that needs to be set up.


Once you have determined how you plan to consume the data, you can identify the specific functionality that needs to be installed:


  1. Delta lake: Provides better read performance by saving data in delta parquet format.

  2. Incremental Update: Makes it easy to upload incremental changes to your data warehouse by storing the data in CSV format.

  3. Link to Fabric: Provides access to financial and operations tables through Microsoft Fabric.


Necessary resources


For each type of integration these are the Azure resources necessary to create. IMPORTANT: Resources must be in the same Azure region as the Finance-Power Platform environment


  1. Delta lake: Requires an Azure GEN2 storage account, a Synapse workspace, and a Synapse Spark pool. In this link you have the details of the installation : https://learn.microsoft.com/en-us/power-apps/maker/data-platform/azure-synapse-link-delta-lake#prerequisites

  2. Incremental upgrade: Only requires an Azure storage account

  3. Link to Fabric: A Microsoft Fabric workspace is required.


With this detailed understanding of the available options, it will be easier for users to select the right integration based on their specific needs. This will allow them to take full advantage of the capabilities of Azure Synapse Link and power their financial operations efficiently and effectively.


Facility


We are going to detail the steps that must be carried out for each type of installation. It is important to note that all installations are done from the same place, on the PowerApps platform of the corresponding environment, in the Azure Synapse Link section:



__________________________________________________________________________________


Deltalake


For the Deltalake type installation we have to mark the following configuration in all the green boxes:

  • Subscription

  • Resource group

  • Synapse workspace

  • Spark Pool

  • Azure storage account


Incremental data lake


For the installation of the incremental Datalake type we have to mark the following configuration in all the green boxes:

  • Subscription

  • Resource group

  • Azure storage account


Subsequent steps


We click Next and a screen will appear that we must configure as follows for both types of installation. In the case of Deltalake it is checked by default. It is important to take into account the refresh time since this will mean an increase in Azure consumption, especially in the case of Deltalake since it uses a Spark resource (Azure Databricks) to generate the parquet files.

Once the checks have been marked, the Finance and Operations tables tab will appear:


And there we mark the tables that we want to export using the functionality:




__________________________________________________________________________________


Fabric


For the installation of the Fabric type we have to follow the same steps as for the installation of the Deltalake .

Once we have completed the Deltalake installation phase we will see that if we place ourselves above the connection created, a Link to Microsoft Fabric icon appears:



It will ask us which Fabric workspace to link it to and from here we will have it available:



Final score


At this point we already have our data synchronizing with the outside world through the functionality that we are describing in this article. Now it is necessary that we see how we are going to find them since it varies a lot depending on the scenario.


Deltalake


In the case of Deltalake we will have a container in the storage account with the name of our environment. This name is not customizable since it is configured by default by the tool.

And inside that container we will find the following:



In the deltalake folder there will be the data itself already converted by our Spark Pool to parquet format:




In the EntitySyncFailure folder we can consult the particular error of a table that we see has not been synchronized successfully, although we can also consult this error from the platform itself where we have activated Synapse Link:


And in the Changelog folder we can find the metadata of each activated table as well as the structure-level changes they have undergone:



Additionally there are folders with timestamps. Incremental files are temporarily stored in CSV format before they are transformed into parquet by the Spark resource.


Incremental data lake


In the case of incremental Datalake, the structure we find is exactly the same except for the deltalake folder:



In this case the data resides in time-stamped folders, organized hierarchically according to the name of the table and with the content in CSV format in annual partitions:




And the metadata of the tables inside the Microsoft.Athena.TrickleFeedService folder:



Fabric


The final result is the same as for Deltalake, the only difference lies in consuming the data, which we will discuss in the next section.


Subsequent data consumption


Now we know how to activate the functionality, we have exported the data, we know how this data is stored in the storage account; but...and now how can we exploit and use them?

In the same way as in the previous sections, the way to exploit the data depends on the installation we have done. So let's differentiate them.


Deltalake


In order to view the data of this type of installation, it is necessary to access Synapse Studio. Let's remember that Synapse is one of the resources that we have implemented and that there is the Spark Pool. This duo has been in charge of setting up a kind of SQL database (without being one) that is fully accessible through queries with the same table structure that Finance and Operations has. So we will see all the tables there, we can launch queries on them and set up views to facilitate data ingestion.


To access Synapse Studio, we go to the Synapse Link that we have created on the PowerApps platform and click on the icon to go to Synapse Workspace:





Here is an example of a query:


In the same way you can also access the data using SQL Management Studio using Synapse as a SQL server. You can find the connection data in the resource in the Azure portal:


Incremental data lake


In order to exploit the data exported through incremental data lakes, it is necessary to know the concept of the Common Data Model.

The Common Data Model (CDM) is a metadata system that allows you to easily share data and its meaning between applications and business processes.

  1. Structure of the Common Data Model :

  • The CDM is a standard, extensible collection of schemas (entities, attributes, relationships) that represent business concepts and activities with well-defined semantics.

  • The CDM facilitates data interoperability by providing a common structure for representing information.

  • Source properties include:

  • Format : Must be cdm .

  • Metadata format : Indicates whether you are using manifest or model.json .

  • Root location : Name of the CDM folder container.

  • Folder path : CDM root folder path.

  • Entity path : Path of the entity within the folder.

  • Manifest name : Name of the manifest file (default is 'default').

  • Schema linked service : The linked service where the corpus is located (e.g. 'adlsgen2' or 'github').

This CDM format that we discussed is the structure that we have seen in the storage account. A series of CSV files (data) with others in JSON format (structure).

Once this format is known, the subsequent task to be able to operate with them is to set up a pipeline or similar in Azure Data Factory to move this data incrementally to a Data Warehouse destination. In later articles we will delve into how to create this type of pipeline, but I leave you a couple of links where you can find tutorials or ready-made templates:




Fabric


At the moment the link between Finance and Operations and Fabric is not active so this section will be updated when it is available.


Limitations


In the following link you can find the limitations that exist in the use of Synapse Link with Finance and Operations:


Conclusions


I hope that with this article I have been able to make clear the ideology behind all this, the different ways of setting it up and that each one is able to decide which of the installations is the most appropriate in their case. We have been able to see the following:

  1. Azure Synapse Link : We discovered how this tool allows us to export data from Finance and Operations and Dataverse to an external storage account, opening a range of possibilities for reports and integrations.

  2. Simple Configuration : We learned how to configure this functionality from the Power Platform portal, choosing the appropriate installation type according to our needs.

  3. Intelligent Storage : We explore how data is stored in the storage account depending on the chosen format, and how to access and consume this data from Synapse Analytics, SQL Management Studio, or Microsoft Fabric.

  4. Limitations and Resources : We review the limitations when using Azure Synapse Link with Finance and Operations, and the Azure resources required for each type of installation.

See you in the next article!!

Comments


bottom of page