Establishing industry regulator key to unlocking synthetic data’s potential – Capco comments

With the FCA’s consultation on ‘Synthetic data to support financial services innovation’ now closed, Stephen Brown, Managing Principle at Capco, points to the challenge of truly replicating real data sets and believes establishing a central regulator is key to unlocking synthetic data’s potential: “Synthetic data can provide a valuable bridge for financial service providers to gain insightsinto customer trends and preferences without breaking privacy and protection rules. However, achieving synthetic data accuracy is critical to establishing the necessary trust within the industry to propel its adoption.

“As the use of synthetic data is largely immature, a lack of trust in its accuracy remains the biggest barrier to wider adoption. Where synthetic data leaves scope to be interpreted asunrepresentative or patently false – for instance when used in conjunction with real data – thiswill likely generate invalid insights at best, and litigable falsehoods at worst. Ultimately this will push firms to instead favour alternative techniques, such as real world data anonymisationor pseudonymisation.

“To solve this accuracy issue, firms should ensure synthetic data is generated from real world, customer data provided by incumbent organisations, via processes that meet a regulator-defined set of standards. However, given that those real world datasets provide the ‘data owner’ organisation with a competitive advantage over their competition, many will likely beunwilling to volunteer their data for now.

“A preferable and more feasible approach could be an FCA-approved standard that would allow an organisation to take its own data and create its own synthetic datasets for use in its own projects. This achieves the goal of driving greater adoption of the use of synthetic data at scale within an organisation. For the business, there is trust in that the synthetic data is representative; and from a compliance perspective, there is mitigation of risk in that the synthetic data meets a certain set of regulator-defined standards.

“Cross-collaboration with other regulators will also be fundamental to establishing standards for generating synthetic data from an organisation’s own data. Without it, widespread adoption would likely fail as the investment to create locale-specific synthetic datasets would represent a high bar of investment.”