ARAG Insurance seizes the power of automation with data lake technology

European insurance group ARAG SE partnered with Xomnia to enhance its overall way of working and customer service with data. It worked on achieving this by migrating its operations to the data lake, and automating the formatting of the sales data supplied by its third party providers.

The project, known as the Datahub, has been successfully completed at the end of 2022, but our collaboration with Arag SE continues. With it, we aim to satisfy the growing data needs of ARAG’s business and modernize its IT infrastructure with a future-proof data lake.

Xomnia has and is continuing to help us develop on our own. Our data engineers have learned a lot from the expertise that Xomnia has brought to ARAG and are continuously using the lessons learned during the development of the Data Hub.


The vast majority of ARAG SE’s insurance policies are sold by hundreds of third party providers, such as major banks. The insurer, however, couldn’t make full use of the policy data sold through those external partners. This is because different partners report their sales differently and through different channels. Consequently, working with the reported data was tedious and involved a considerable amount of manual actions.

The insurer approached Xomnia to collaborate on overcoming this challenge through creating the necessary data infrastructure to clean and unify all data about their sales, policy portfolio and claims. This would also allow them to generate insights out of their various sales data quickly and accurately.


The first challenge that needed to be addressed was replacing ARAG’s traditional inhouse data warehouse with a data lake. This is because its on-premise data warehouse had become outdated and costly to operate and maintain. Xomnia’s Machine Learning Engineers Dustin van Weersel, Siem van den Reijen and Maarten van Raaij joined ARAG SE Datahub’s team to further develop the data lake and accompanying data marks to replace the old data warehouse.

Next, our team worked on creating a data pipeline that can automate the process of cleaning and transforming all the data of the sold insurance policies into a universal format, from which insights could be generated.

“Using certain parameters and settings, we tried to automate this process, and conducted some manual mapping tasks to get to where we want to go, based on research about each distribution partner and the information that they supply,” explained Siem.

After the data has been transformed, it is used to create a data model within SQL server, a relational database. The resulting data model is then ingested by PowerBI to create datasets which are made available to the business for reporting. Data engineers and data scientists can directly use the relational data model within SQL Server for analyses and modeling purposes.

Since ARAG SE’s specialty is (legal) insurance, it mainly deals with policies sold to customers and the claims they make on their policy. Therefore, our team is developing separate data models dedicated specifically to the policies and claims. This will give ARAG increased insights into their portfolio, as well as help them improve the services they provide their customers.


The dashboard is still being developed, but it is already helping ARAG SE make more use of its policy, claims and internal and external sales data. For instance, the client can already get more insight into its profit and loss statements (P&L) and full-time equivalents (FTEs). This will help them to quickly understand sales trends among its different clienteles and geographies. They will also have more detailed insights into the performance of specific coverages within their product offerings.

Besides the core business of ARAG, the Datahub also services the various internal departments such as Finance and Control, HR and Sales. For example, consolidating the policy data of all the various reinsurers into a singular data model will allow the service center to give quicker responses and increased confidence in the information, thanks to having all the data in a one single source of truth (SSOT).