This Israeli Startup Developed Synthetic Data For Training Machine Learning Models

Share on linkedin
Share
Share on facebook
Share
Share on twitter
Tweet

Data encryption is a backbone of cloud environment data processing but poses many risks with vulnerabilities unsolved today. Even with new encryption methods like homomorphic encryption, sensitive data is still accessible whenever data processing is completed off the premises of the company – on cloud. Data hackers or rogue employees with the key credentials and encryption keys can access everything. And with the rise of AI and dataset training in sectors that use highly sensitive data inputs, such as healthcare and financial consumer data, the demand and activity surrounding usage of sensitive data is increasing, along with the weight of a potential breach. To combat, Kymera Labs, a new Israeli startup, has developed a security panacea with their Synthetic Data, that’s based off of a combination of AI and cutting-edge mathematical models.

“Data was traditionally an output of code, but the process has reversed with the advent of AI today; data is now a component to develop the code” explained Yoav Vilnai, CEO and Co-founder of Kymera Labs. But the data to train high quality AI models – credit card data, banking data, among other types – is super sensitive and inaccessible to any party but the in-house team. Whether it’s GDPR, Open Banking protocol, or banking regulations, data is subject to ultra high scrutiny.

Kymera Labs’ synthetic data retains the same properties as the real data set, without biases. 

To solve that problem, Kymera Labs, founded in December 2018 by serial entrepreneurs Yoav Vilnai and Gilad Manor, developed synthetic data to mimic real sensitive data. “Our data is risk-free” said Vilnai. Kymera enables access to sensitive data sets for accelerating AI development. The platform produces synthesized data, which is completely free of data sharing limitations and cannot be traced back to the original data set. The Synthetic data maintains all the statistical characteristics and data dependencies of the real data set.  “Essentially, avoiding usage of real data in cloud environments or QA and staging environments is an ideal cyber security measurement”, added Vilnai. “Recent regulations in North America, EU and other countries forbid the usage of real data in lower environments. While current methodologies of anonymization are costly and inefficient, using synthetic data offers full data usability with maximum protection.”

“Mathematically, you cannot reconstruct the fake data from the real data because there’s no function or encryption to generate the data” explained Vilnai. “For many use cases, this is much better than encryption methods, which can be decrypted.” Kymera allows Financial institutions to work with anything that’s not for pure-production purposes (i.e external analytics providers, credit scoring, training AI models) without sharing their real data.

Other companies invented and utilize homomorphic encryption, enabling companies to analyze encrypted data while it is encrypted, which ostensibly is a formidable solution, but it’s limited by performance and use cases are very particular. In fact, Google recently open-sourced their version of a homomorphic encryption tool to address the mounting limitations developers face with data sensitivity and security.

The startup’s technology is based on a new algebra-based mathematical model created over the last two years by Manor and Vilnai. Kymera’s unique methodology allows to fully maintains data features without introducing bias. One of the key advantages of Kymera’s technology is the ability to synthesize not only scalars but also labels, addresses, dates and keeping data dependencies.

“We envision synthetic data to fill the testing environments and cloud environments for financial and insurance companies where real data should never be found” explained Manor. “Real data should only be used for production purposes and eventually, every company will have a real data repository as well as a synthetic data repository”.

The synthetic data can be used for a variety of use cases, like machine learning model training, low coding environments, or for Open Banking.

Kymera Labs’ team (left to right): Noa Srebrnik (VP Product), Gilad Manor (CTO) and Yoav Vilnai (CEO).

Kymera Labs raised $1 million from venture investor Peregrine Venture Capital in early 2018, as well as from other private investors, and they’re planning to raise their series A round later this year.

Kymera is already working with local banks in Israel. As financial entities today have integrated AI into their core agenda, vulnerabilities for data security and functional usability still linger, though. Kymera’s synthetic data solution is a means for rapid innovation and no risk, with a chance to disrupt traditional encryption altogether.