ARN

Google releases differential privacy pipeline for Python

PipelineDP allows datasets containing personal information to be aggregated in a way that preserves the privacy of individuals.

Google is extending differential privacy capabilities to the Python language, with an open source tool for creating pipelines that aggregate data containing personal information in a way that preserves the privacy of individuals.

PipelineDP, developed in partnership with OpenMined and accessible from the project website, is still in an experimental stage. 

With differential privacy, useful insights and services can be provided without revealing any information about individuals. PipelineDP follows the 2019 launch of an open source version of Google’s foundational differential privacy library, which works with the  C++, Go, and Java languages.

Developers, researchers, and companies can use the new Python library to build applications with privacy technology that enables them to gain insights and observe trends from datasets while protecting and respecting individual privacy, Google said. 

PipelineDP can be used with the Apache Spark and Apache Beam frameworks for data processing. It already has enabled users to begin experimenting with new use cases, such as showing a website’s most-visited pages on a per country basis in an aggregated, anonymised way.

Google also is releasing a differential privacy tool to allow practitioners to visualise and tune parameters used to produce differentially private information. In addition, Google researchers have published a paper that shares techniques for scaling differential privacy to datasets of a petabyte or more.