privacyguides.org/blog/posts/homomorphic-encryption.md
2025-07-27 09:44:50 -05:00

8.7 KiB
Raw Blame History

date categories authors tags license schema_type description preview
created
2025-07-24T19:00:00Z
Explainers
fria
Privacy-Enhancing Technologies
Homomorphic Encryption
BY-SA BackgroundNewsArticle We rely on services that process our data every day. It's accepted that in order to process our data, the service needs to see it in plaintext, however homomorphic encryption aims to change that by bringing E2EE to serverside processing.

Privacy-Enhancing Technologies Series: Homomorphic Encryption

We rely on lots of server-facing services in our day to day lives, whether it's using server-side AI like ChatGPT or querying search engines. It's just assumed that when we use services like this, those services need to process our data in the clear. But, with homomorphic encryption, data can be processed server-side while still remaining E2EE.

Privacy Violations

We've surrendered much of our lives to the services we use everyday. From music and video streaming to searching, to using AI services. Even things we don't typically think of as online services like buying things at the store typically query a database.

AOL Search Data Release

AOL thought it would be a great idea in 2006 to release the searches of over 650,000 users. It was "anonymized" by scrubbing the actual user's names and replacing them with numbers.

Simply using user No. 4417749's searches such as "homes sold in shadow lake subdivision gwinnett county georgia" and "60 single men", journalists at the New York Times were able to re-identify the user as Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga. They even went to her house to meet her.

Search engines keeping enough data to send someone straight to your house is terrifying needless to say, and that's just what they released willingly.

Yahoo!

In 2013, the search engine Yahoo! experienced a data breach affecting "all 3 billion accounts". All those users risk reidentification just like what happened with the AOL users, except this time there was no attempt at anonymizing the users.

Equifax

The credit bureau Equifax which handles sensitive data such as SSN's was breached in 2017, exposing extremely sensitive data of 147 million people. The company settled for $425 million in damages, but it's hard to put a number on the devastation of having your identity stolen. Stronger protections for sensitive financial data need to be put in place to avoid situations like this.

MediSecure

In 2024, the personal health information of "2.9 million Australians who used the MediSecure prescription delivery service during the approximate period of March 2019 to November 2023" was breached. The data included extremely sensitive health data such as

  • full name
  • title
  • date of birth
  • gender
  • email address
  • address
  • phone number
  • individual healthcare identifier (IHI)
  • Medicare card number, including individual identifier, and expiry
  • Pensioner Concession card number and expiry
  • Commonwealth Seniors card number and expiry
  • Healthcare Concession card number and expiry
  • Department of Veterans Affairs (DVA) (Gold, White, Orange) card number and expiry
  • prescription medication, including name of drug, strength, quantity and repeats
  • reason for prescription and instructions

An absolutely devastating breach of user privacy by any metric.

OpenAI

When services process our data in the clear, we not only run the risk of the service themselves abusing their access to that data, but also court orders legally requiring them to retain data.

OpenAI was required to retain all ChatGPT user logs, even deleted ones. This is devastating for user privacy when you consider that ChatGPT handles over 1 billion queries per day.

This is a clear violation of user privacy, and it happened out of the blue in a lawsuit that wasn't even related to user privacy. When companies have access to our data, it might not even be up to them how it's handled sometimes. This is why E2EE is so important: it's not only about trust but about making it so services can't access data even if they tried.

Beginnings of Homomorphic Encryption

As is typical, the first mention of homomorphic encryption come from a paper from 1978 titled ON DATA BANKS AND PRIVACY HOMOMORPHISMS.

It's funny seeing the concerns of the time. A given example is a loan company that uses a time share (sharing a computer with others and having a limited time window to do your computing) with another company and how they have to choose between that and getting their own computer. With companies now moving more and more of their own infrastructure to cloud services provided by other companies, it seems we've come full circle.

One of the suggestions is to use modified hardware that can decrypt data for the CPU to process. The idea of using secure hardware to protect user data is currently in use through Confidential Computing and the use of secure enclaves in the CPU to separate out the data of different users.

The second solution they propose is a solution that doesn't require decryption of user data at all, which they call "privacy homomorphisms". The examles they give theoretically allow for addition, subtraction, multiplication, and division on encrypted data, although they state in the paper that many of them are likely not secure.

Notably, the schemes mentioned allow only for either addition and subtraction or multiplication and division, which means if you want to do both you need to decrypt the data.

Despite the shaky security of these early schemes, they would lay the groundwork for the field in the field going forward.

Fully Homomorphic Encryption

It wasn't until 2009 when the idea of homomorphic encryption would be improved on in A FULLY HOMOMORPHIC ENCRYPTION SCHEME by Craig Gentry.

This paper introduced fully homomorphic encryption, which allows for both addition and multiplication, meaning it can now theoretically be used for any calculation.

The scheme relies on some injected "noise" when performing operations. When adding, the noise is increased, but when multiplying the noise is amplified quite a bit. The scheme relies on staying below a certain noise threshold so that the answer can be decrypted and still be accurate.

This limits how many operations can be done on the numbers before they become to noisy to use.

However, it's possible to "bootstrap" after it gets too noisy, resetting the noise to below the threshold. This gives this scheme the ability to do as many operations as you want since you can just keep resetting the noise.

It's based around ideal lattices because they have some useful properties allowing for more efficient key generation and algabraic operations. Because it's based on lattices, it's considered quantum resistant as well (there's no known efficient algorithm to solve lattice problems).

Unfortunately these early homomorphic encryption schemes weren't very performant, taking up to 30 minutes per bootstrapping operation. Obviously, this is not ideal and prevents the scheme from being used for any real world tasks.

Second Generation FHE

Several papers would chip away at the inefficiencies of Gentry's original scheme, finding ways of improving it over the next few years. They found a way to manage noise better

Still, even with all of these improvements, the second generation fully-homomorphic schemes would still rely on bootstrapping a partially-homomorphic scheme.

Researchers were able to achieve fully homomorphic encryption using arbitrary lattices instead of ideal lattices using a new re-linearization technique. They were also able to remove the squashing step, improving efficiency and reducing the amount of assumptions that have to be made.

A later paper introduced leveled homomorphic encryption, allowing for evaluating problems of any length. It introduced modulus switching as an improved noise reduction technique. It also used bootstrapping as an optimization instead of being required to achieve fully homomorphic encryption. They also introduced batching, where they incorporate multiple plaintexts into one ciphertext, allowing them to evaluate multiple inputs with the same efficiency as evaluating one.