D365FO DataLake Integration

Ivan Font Miñarro
May 17, 2022
5 min read

Updated: Aug 21, 2022

After a while without posting we are back with new features!

We are heading to the new blog entry. It is known that a new integration functionality between D365FO and the "outside" was born some time ago. This function was in preview and many of us have been able to test it before it was made public. We are talking about the integration between D365FO and Datalake.

With this new functionality we can export the content of the Dynamics tables in "real time" to an Azure Datalake and from there exploit this data with our analytical tools. This export does not require any maintenance like the previous BYOD and saves us a lot of headaches.

What is the DataLake?

To get an idea of what Datalake means, we are going to draw a parallel with reality. As you all know, there are a series of buildings or warehouses in cities and towns that have the function of personal storage for individuals and companies. The client pays a monthly amount and has his own small warehouse to store everything he wants there. It can be from information to furniture, through memories or whatever comes to mind. These objects are stored there and from the outside you can only see a locked "cabin".

This is what an Azure Datalake would be; a storage space in the cloud that supports a wide variety of objects and sizes and that is also structured, secured and optimized to be consulted. Thus, we can store from text files of a few KB to images of virtual machines of several GBs. All this following a hierarchical structure like the one we see on our PC.

Generations

Azure Datalake is made up of a diverse variety of technologies. The details of each of them and relevant information can be found here. However, in this article we will only focus on what directly concerns the connection with D365FO: Azure DataLake Store (ADLS).

But first let's take a look at the general evolution that ADLS has had to get where it is now. ADLS has 2 generations on its back; the first has gone unnoticed by Dynamics and had the following characteristics:

Built for Hadoop
Unlimited storage, petabyte sized files
Optimized performance for big data analytics
Ready for the enterprise: durable and highly available

The second generation (ADLS Gen 2) adds all the features of the first generation with the capabilities, versatility, and capacity of Azure Blob storage. Data Lake Storage Gen2 provides file system semantics, file-level security, and scalable, low-cost tiered storage with high availability and disaster recovery capabilities.

Which is the aim of this feature?

Once we know the characteristics and essence of this Azure service, we are going to list the objectives of joining it to D365FO.

Extract data in real time
Not relying on data mining jobs (as we currently do with BYOD)
Do not depend on Data Entities (therefore we do not depend on development tasks)
Unify data sources (Datalake allows us to connect not only D365FO but also other external data sources and centralize everything in a single service)

Infrastructure

In the following image you can see the idea regarding the infrastructure of Azure services that is behind this functionality.

In addition to the Datalake functionality itself that we will install from LCS, a series of other elements are necessary:

CDMUtil: this tool will help us to be able to visualize our data from Azure Synapse since it will be in charge of generating some views on the CSV files generated by the export to Datalake.
Azure Synapse Analytics: this service will be the core point of the functionality, since it will be the anchor point to which all analysis services will have to connect to consult the data

Setup in D365FO

To be able to use this feature in D365FO it is necessary to configure the link between the D365FO environment and PowerPlatform. For this, LCS makes our work easier and we can link it by clicking on the following button:

Once you have done this previous step, you can consult the guide on the installation of Azure services at the following link:

https://docs.microsoft.com/en-us/dynamics365/fin-ops-core/dev-itpro/data-entities/configure-export-data-lake

We will stop at the installation point of the Add-in itself. It is important to mention that the functionality we are talking about has some public features and others that are still in Preview. Unfortunately, the test part is the one that interests us the most since it is the real time itself. This point is only available for UAT Tier-2 environments and on request for productive environments. Thus, we will have the possibility to test it if we mark the related check of the LCS installation wizard.

Once installed we can go to D365FO to the following route:

System administration > Configure > Export data to Azure Data Lake

If we do not see this route we can access it through the URL by adding the following information:

&mi=DataFeedsDefinitionWorkspace

We will see that each and every one of the D365FO tables appears, both standard and customized, and by means of the "Activate" and "Deactivate" buttons we can make them synchronize in Datalake.

Also in the "Last change processed" column we can see when the last change synchronization was made:

Once this is done, that's all we have to configure in D365FO. As you can see, it is very simple to activate the functionality from the environment itself, and we can access it whenever we want to activate or deactivate the tables we need.

CDMUtil

But not everything ends here. When we have activated the synchronization the data navigates from our D365FO environment to the Datalake storage account. The appearance of our data is in the form of CSV organized in folders and subfolders following the CDM (Common Data Model) structure.

That is why we need a tool that allows us to create a bridge between something easy to read and the Datalake file network. This tool is CDMUl that you can download from the following Github repository:

https://github.com/microsoft/Dynamics-365-FastTrack-Implementation-Assets/tree/master/Analytics/CDMUtilSolution

Once downloaded we have to modify the following file:

Specifically, correctly assign the values of the following image to those created in our Azure portal:

Once the tool is configured and executed, it will build a series of views (one for each table activated in D365FO) in our Azure Synapse. The vision that we are going to have will be that of the following image, that of a database with exactly the same views in terms of structure and type of columns as those we have in the original database.

It is important to keep in mind that every time we activate a new table or make changes to the structure of one of them, we must run CDMUli again so that the views are refreshed.

We can also configure the tool to run automatically when there is a change in the structure of Datalake but we will not detail this configuration.

Post analysis

Once we have our views in Azure Synapse, we can connect our analysis tool (either PowerBI, Excel...) and start creating our metrics. But often it is possible that not only do we have to take into account the data from D365FO, but there may be data sources other than the ERP and it is our task to create the solution that best suits our needs.

For this, I attach below a summary table extracted from the TechTalk on Datalake architecture that you can see at the following link:

https://community.dynamics.com/365/dynamics-365-fasttrack/b/techtalks/posts/export-to-azure-data-lake---scenarios-and-architecture-patterns-april-26-27-2022

I hope you liked this article. Big changes are expected in the integration of the different applications of the Dynamics ecosystem!!