← Blog

Is it legal to scrape public data to build my product?

Scraping public data to build a product is usually not banned, but the thing most builders miss is that public personal data is still personal data. Under the EU's GDPR you generally need a lawful basis to process it, and you owe the people a notice that you collected their data, normally within one month. The wall here is not the scrape itself. It is transparency. A name or email being visible on a website does not place it outside the law.

Tovrio engine result card: a B2B app that scrapes public web and social profiles to build a sales lead database, tested with a solo founder based in the EU, returns signal NEEDS REVIEW with a validate-first verdict. The rule EU-GDPR-ART14 fired: GDPR Article 14 requires giving notice to people whose personal data you collected from a source other than themselves.

The card above is a real result from a compliance check. The idea tested was a synthetic example: a B2B app that scrapes public web and social media profiles to build a sales lead database of business contacts. The profile was a solo founder based in the EU on a small budget. The result was a needs review signal, not a blocked one, because scraping public data is a conditional activity rather than a forbidden one. The condition is the duty explained below.

Why public data is still personal data

GDPR defines personal data as any information relating to an identified or identifiable person, and nothing in that definition turns on whether the data was public. A business email, a name next to a job title, a public profile: each one relates to a person, so each one is personal data. Collecting it by scraping does not change what it is.

This is the assumption that trips up most data products. "It was already public" feels like permission. Under GDPR it is not. Public availability can be relevant to your lawful basis analysis, but it does not exempt the data from the regulation or remove the duties that come with holding it.

One scope note for this article: it covers the privacy law side of scraping. Whether a site's own terms of service or a computer access law such as the US Computer Fraud and Abuse Act permit the scrape is a separate question, and one worth checking alongside the privacy duties here.

Source: GDPR Article 4, definitions.

The notice duty most scrapers miss: GDPR Article 14

When you collect personal data from a source other than the person themselves, GDPR Article 14 requires you to give those people the privacy information directly: who you are, what you are doing with their data, your lawful basis, and their rights. This is the article that applies specifically to scraping, buying lists, and any indirect collection. Article 13 covers data you collect straight from the user. Article 14 covers data you collect about them from somewhere else.

The timing is the part that surprises people. The notice is due within a reasonable period and at the latest one month after you obtain the data. If you use the data to contact the person first, the notice is due at that first contact. If you disclose the data to someone else first, it is due by that first disclosure.

There is a narrow exemption. Where giving individual notice would involve a disproportionate effort, you can be relieved of the direct notice duty, but you then have to make the information publicly available instead. The exemption is not a way to skip transparency. It changes how you deliver it, not whether you owe it.

One thing to keep separate so this is not misread: notice is one duty, and a lawful basis is another. To process scraped personal data at all you still need a lawful basis under Article 6, which for scraping is often legitimate interests, and that carries its own balancing test against the rights of the people involved. Special category data such as health, biometrics, or political opinions sits under stricter Article 9 conditions. Article 14 tells you to inform people. It does not by itself make the processing lawful.

Source: GDPR Article 14, information where data not obtained from the data subject.

The myth versus what the law says

The common assumptionWhat GDPR actually requires
Status of the dataPublic, so fair gamePublic personal data is still personal data
Permission to use itNone needed, it was visibleA lawful basis under Article 6, often legitimate interests with a balancing test
Telling the peopleNo need, you never contacted themArticle 14 notice, normally within one month
If notice is hardSkip itDisproportionate effort relieves direct notice but you must publish the information instead

What you still owe after you collect

The duties do not end at notice. People whose data you hold keep their GDPR rights, including the right to erasure under Article 17, often called the right to be forgotten. If you store scraped personal data in a way that makes deletion hard, for example in immutable logs or append only history, you have built yourself a compliance problem. Designing for erasure from the start is far cheaper than retrofitting it.

You also take on the ordinary obligations of holding personal data: a lawful basis you can point to, security safeguards, a retention limit, and a route for people to exercise their rights. None of this is a ban on the product. It is the operating cost of building on personal data, and it is worth pricing in before you commit.

Source: GDPR Article 17, right to erasure.

Does this apply outside the EU?

Often yes. GDPR reaches processing tied to offering goods or services to people in the EU or monitoring their behaviour, regardless of where you sit. Scraping the personal data of people in the EU can bring you in scope as a non-EU founder. The duty does not move off your plate because your company is incorporated elsewhere.

The same shape appears in other regimes. US state privacy laws such as California's require a notice at the point of collection, and a growing number of states now require data brokers, businesses that collect and sell personal data about people they have no direct relationship with, to register and honour deletion requests. The labels differ. The underlying idea, that collecting people's data quietly is not free, keeps showing up.

If you are a solo founder on a small budget

For a solo founder, the cost of a scraped personal data product is not the scraping. It is the compliance tail. Article 14 notice at scale, a defensible lawful basis, erasure on request, and security duties are real work, and they grow with the size of your dataset. That is why a compliance check returns this kind of idea as needs review rather than a clean pass.

If the compliance tail is too heavy for your stage, these are not workarounds. They are different products that lower or remove the duty:

  • Scrape and process only non-personal data, such as aggregate or company level facts that do not identify a person.
  • Collect data directly from users who sign up, which moves you to the lighter Article 13 notice and a clearer consent basis.
  • Build for a customer who already holds the data and the lawful basis, as a B2B tool, rather than assembling the dataset yourself.

Each of these changes what you are holding, which is what changes the duty.

How to validate before you build

The result shown above came from Tovrio, a compliance check that runs an idea against country specific rules before you write code. The idea tested was a synthetic case, not a real user. The result was a needs review signal with the reasons named here.

This is a validate before you build signal, not legal advice. A flag means "go confirm this with a data protection professional before you commit," not "your specific plan is definitely unlawful." You can run your own idea through it.

Frequently asked questions

Is scraping publicly available data legal under GDPR?

Scraping public data is generally not banned, but public personal data is still personal data under GDPR. You need a lawful basis to process it, and you owe the people the transparency information in Article 14. Public availability does not remove either duty.

Do I need consent to scrape personal data, or just to notify people?

They are two different things. Notice under Article 14 is always owed when you collect personal data from a source other than the person. Separately you need a lawful basis to process at all, which for scraping is often legitimate interests with its own balancing test, and consent for special category data. Notice alone does not make the processing lawful.

Does publicly available mean the data is exempt from privacy law?

No. A name, email, or profile that someone can see online is still personal data. GDPR applies to personal data regardless of whether it was public. The fact that you found it on a website does not place it outside the regulation.

Do I really have to notify every person whose data I scraped?

Generally yes, normally within one month of collecting the data, or sooner if you first contact them or share the data onward. There is a narrow exemption where individual notice would take disproportionate effort, but even then you must make the information publicly available. It is not a free pass.

Does GDPR apply if I am not based in the EU?

It can. GDPR reaches processing related to offering goods or services to people in the EU or monitoring their behaviour, regardless of where you are located. Scraping the personal data of people in the EU can bring you in scope even as a non-EU founder.

Is building a B2B leads database from scraped data legal?

It is not automatically illegal, but business contacts are usually still personal data, so the Article 14 notice duty and a lawful basis both apply. A B2B framing does not switch off GDPR. Confirm your basis and your notice approach with a data protection specialist before you sell the database.

Run your own idea through Tovrio before you build. See how it works.