Data Governance & Sitecore xDB: Why Where Data Sits Actually Matters

June 22, 2016 | TBG Strategy/UX & TBG Tech

lock surrounded by chainsxDB Now Powers Sitecore Personalization & Engagement Automation

With the evolution of Sitecore’s personalization, customer engagement and marketing platform in the 8.X versions, Sitecore’s approach to tracking users has improved dramatically, making it the best possible CMS-based system for realizing the goal of highly targeted 1-to-1 message delivery within a rich marketing ecosystem. Sitecore has done this through improvements and extension to xDB, its analytics database, largely through re-architecting it in Mongo DB—a high-performance, non-SQL database designed to handle massive data. xDB’s document-oriented Mongo database can be hosted in any environment, but in many cases is hosted by cloud providers—a circumstance that will become meaningful below.

The key elements to understand about the Sitecore move to Mongo are:

  • Sitecore wanted a database that would be fast enough to deliver data about customers to power personalization and other “real-time” interactions, which need nearly instantaneous response times, and
  • Sitecore knew that the same speed of customer data access needed for native Sitecore captured data (Web activity, conversions, content viewed profiling) would need to also extend to outside data about users—say from a CRM record, or 3rd party data source.

In previous versions of Sitecore, if you wanted to include non-Sitecore data in personalization rules or engagement plans, integration was difficult and you had to heavily rely on converting that data to native Sitecore data elements, like numeric profile scores, to expose it to the rule builder, as well as deal with performance issues and/or issues of data freshness, due to the fact that nightly sync processes frequently needed to be involved. In other words, it was possible to use external data to drive automation within Sitecore, but the methods available were far from best practices. 

xDB Means Extensible, Fast Data About Users

With Mongo and the 8.X versions of Sitecore, the picture has dramatically brightened.

  1. Mongo is designed to be extended—to hold custom fields of non-Sitecore data.

  2. Sitecore’s Data Ingestion Manager (available in 8.3) provides an out-of-the-box component to manage data integrations with other systems, based around unique identifiers (say, a captured email address from a form, or a clicked Sitecore-hosted email or landing page link conveying a campaign code).

  3. Once this data is up-to-date in Sitecore, it is a relatively straightforward integration to make it available to the Sitecore rules engine—creating a trigger for personalization or customer engagement plans (multi-channel, email, etc.).

  4. Additionally, once the data is up-to-date in Sitecore, it can be used in the broader Sitecore rules context without performance penalty, which means very fast-loading pages, and from that standpoint, it matters very much for performance where the data lives. (xDB = faster loading pages by 10X or more).

Sitecore xDB diagram

As our practice has evolved, integrating user data from other databases, 3rd party data sources, and finding clever ways to leverage them in personalization rules, has become a core aspect of the value TBG brings. But this immediately raises questions of “what data should be integrated?” Some of the sources have thousands of data fields, and of course, the way that data is held in those systems may be highly specific and unique. Data integrated into xDB should be strategic, usually selected by reverse engineering what kind of personalization will be enabled and then only integrating the fields that are appropriate.

Some examples of non-Sitecore external data that might be extremely useful to have in the xDB so that it can contribute to real-time personalization and automation include selected fields from:

  • Personal demographics
  • Order/purchase history
  • Loyalty program participation, and details.
  • User interests or concerns captured in other digital channels
  • Membership status

To the extent information is accurate and up-to-date, and can be integrated into xDB, it has the potential to trigger rules of various kinds, greatly increasing the options and relevance of contextual content, which can mean increased conversion, better user experience, or increased customer satisfaction, or all three. 

How Data Governance Complicates the Situation

There is another large issue, though, beyond the logistical one of performance—the issue of data governance. Data governance is the practice of paying attention to, evaluating, and yes, sometimes restricting where data can be held. To the extent it needs to be taken into consideration, this issue cuts in the opposite direction generally for Sitecore, making arguments against mirroring some customer data in the xDB. Some types of organizations are highly restricted in terms of how they have to secure and control the flow of certain kinds of data and may not always have the option of moving certain types of data into environments that don’t have specific controls.

Consider two of the highly regulated industries that TBG works with:

  1. Healthcare
  2. Financial Services

Both industries have (usually secure) back-office systems that contain a plethora of customer data—in both cases, CRM (acting as a “clean broker” of some of that information for marketers); healthcare utilizes ERP systems and medical portals that contain details of patient healthcare information, and likewise, financial services uses back-end systems that hold account information and transaction history.  

The healthcare industry is largely regulated by HIPAA (US Health Insurance Portability and Accountability Act ) to the extent that any records that can be reasonably viewed as protected health information (PHI) is subject to a high level of data governance and concern—including specific limitations on how data can be stored, transmitted, used, and who can use it. (Sitecore, however, does not natively capture any PHI or create any HIPAA issues through xDB.)

Though it is legal within certain restrictions to use PHI for marketing purposes, it is generally not advisable to integrate actual PHI fields into xDB from a security standpoint, whether it is hosted on premises or in the cloud. The risks of data being able to be access, or the legal obscurity of what constitutes exposing the data are just too great to tolerate. However, we can bridge the gap between actual PHI and data that is health-related and thus, can still be integrated.

A similar situation exists in financial services, which are regulated by a large number of laws, not to mention constrained insurance and underwriting concerns. Data about individual finances may be extremely valuable for remarketing purposes, as a potential driver of personalized experiences, but outside of secure financial portals, it can be highly problematic to address directly in xDB. In general, financial services firms need to keep this sort of customer information out of xDB much as healthcare organizations need to avoid spreading PHI to systems with the strict governance safeguards that are appropriate.

Having Your Cake and Eating It Too

Marketers shouldn’t despair though, because as they say, “where there is a will, there is a way.” Although each case is unique, and marketers need to discuss these matters with legal counsel, two techniques can address many of the issues that are related to data governance and end up with usable extra data fields in the xDB, allowing for many of the same impacts as those that can’t be used.

The two techniques, which can be used in combination, are likely to satisfy many of the considerations that restrict the use of data. They are:

  1. Basic. Abstract the data before it gets to xDB. For instance, at the CRM level, create derivative “segments” or extra fields of metadata on customer records, calculating their values from the restricted data. In healthcare, for example, a marketer might put a diabetes patient into a segment that broadly offers population health services to diabetics and other groups (assuming this is a HIPAA permitted use) and then integrate that field rather than the underlying PHI.

    Hint: one way to do this is to build a component that shifts the values of Sitecore profile keys based on changes in related CRM fields.

    In financial services, one might abstract from a customer’s net worth to place them in a “market wealth-management solutions, top level” audience segment, which is then flagged in the xDB to drive personalization for wealth management services. These sorts of abstraction can effectively make use of the data without shifting it into an environment that lacks the proper controls.

  2. Advanced. If the basic approach doesn’t cut it for security reasons, an alternative or additional approach can build on those techniques but also include steps to technically obscure the identity of the user in xDB—since to integrate external data, we need a unique identifier (like an email address, or the user clicks on the link of campaign email tied back to CRM via tracking codes served within the Sitecore ecosystem). This can be effected in a variety of ways and definitely involves custom development.

    One way is to encrypt the unique identifier with a custom component that lives behind the firewall and de-encrypt it when syncing fields. If the user cannot be personally identified in xDB in any other way (for example, from form email captures or from Sitecore’s Social Connect for Facebook), then it is possible to engineer a scenario that uses abstracted user data but can’t be traced back from xDB to a particular real-world person. This scenario is much more secure and may meet some of the most important requirements—although again, each situation must be evaluated contextually.

Either the first technique, or both techniques in combination, can safely open up a lot of possibilities, even in highly regulated industries. But those who ignore these considerations do so at their own peril—since one can make a few moves that can clash violently with either the law, institutional policy or both. It is worth taking the time to figure out the right approach.

So, to sum up the thrust of this blog post:

  • You must put your external data into xDB if you want it to be usable to trigger personalization rules, though it may still require custom development.

  • Have a critical data-governance approach to what data you choose to put into xDB—based around an evaluation of regulatory and security considerations.
  • Realize you have options where you can get the impact of this sort of external data, without putting it into xDB in pure form, identifying the user explicitly, or leaving a breadcrumb trail for hackers back to the original user record in CRM. 

About the Authors

person's silhouette with maze pattern
TBG Strategy/UX

TBG’s Strategy/UX team features resident virtuosos of digital business consulting, usability, information architecture, & CMS implementation & user testing. When you meet them, it’s impossible not to immediately catch that the digital world is their oyster.

person's silhouette with hex grid pattern
TBG Tech

TBG’s Technical team is always on the cutting edge of Sitecore, ASP.NET, & new deployment technologies. There’s always a smarter, faster, better way to reach solutions, & we may die trying to find them.

Leave A Reply

comments powered by Disqus