You have already given away more than you realise.
Not through a data breach. Not through anything illegal. Through every search you typed, every product you clicked, every location your phone logged, every form you filled, every photo you uploaded, every article you read longer than thirty seconds.
Somewhere, all of that is stored. It has been sold. It has been aggregated. It has been fed into AI models that now know things about you — your income bracket, your health concerns, your relationship status, your political leanings, your insecurities — that you have never explicitly told anyone.
This is not paranoia. This is how the data economy works. And AI has made it significantly more powerful, more precise, and more difficult to opt out of.
The question is not whether your data is being collected. It is. The question is what you can reasonably do about it — and how to prioritize your effort so the actions you take actually matter.
This article covers:
- What AI does with your data that older systems could not
- The five categories of personal data most at risk in 2026
- Practical protection steps ranked by effort and impact
- What data brokers are and how to remove yourself
- How to use AI tools without feeding them everything
- The honest limits of what personal action can achieve

What AI Changed About Data Collection
Data collection is not new. Companies have collected customer information for decades.
What changed is what they can do with it.
Ten years ago, a retailer knew you bought running shoes twice last year. That was useful for sending you a discount coupon.
In 2026, an AI system sees that purchase alongside your search history, your location data, your social media activity, the articles you read, the products you looked at but did not buy, the time of day you browse, and the device you use.
It infers that you recently started running because you are trying to lose weight. It cross-references that with your pharmacy loyalty card data to estimate whether this is a health concern. It adjusts the price you see for health insurance. It influences which job listings appear for you. It affects what content your social feed prioritises.
You bought running shoes. The system built a health profile.
How AI inference works on ordinary data: Raw data points collected: - Searched "best running shoes for beginners" - Bought running shoes in March - Location data: nearby park, 6 AM, three times per week - Searched "shin splint treatment" - Purchased ibuprofen and compression socks - Read articles: "running for weight loss", "beginner 5K plan" - Decreased fast food delivery orders - Increased grocery orders including protein foods What AI infers: - Started running programme approximately March - Experiencing minor injury (shin splints) - Motivated by weight loss - Likely health-conscious lifestyle shift - Estimated BMI range based on purchase patterns - Probability: considering gym membership (87%) - Probability: will search health insurance in 60 days (64%) None of this was told to any company. All of it was inferred from data freely given.
This is the fundamental shift. It is not that AI collects more data than older systems. It is that AI extracts dramatically more meaning from the same data — inferring things you never disclosed from patterns across things you did.
KEY FACT: A 2023 study by Duke University found that data brokers were selling detailed mental health data — including lists of people with depression, anxiety disorders, and PTSD — without any consent mechanism. The data was inferred from browsing patterns, purchase history, and location data, not from medical records. AI inference made medical-grade sensitive information derivable from entirely non-medical sources.
The Five Categories of Data Most at Risk
Not all data is equally sensitive. Understanding which categories matter most helps you prioritise where to spend effort.
Category 1 — Location Data
Your location history is more revealing than almost anything else about you.
Where you sleep every night identifies your home. Where you go every weekday morning identifies your employer. Regular visits to a specific clinic identify a health condition. Time spent at a legal office, a marriage counsellor, or a religious institution reveals things you have never posted publicly.
What location data reveals: Regular pattern: What it implies: ────────────────────────────────────────────────────── Same address 10 PM - 6 AM Home address Same location weekdays 9-5 Employer Weekly visits, medical clinic Health condition or treatment Monthly visit, law firm Legal matter Friday evenings, specific bar Social habits, possible religion Overnight stays, not home Relationship status changes Visits to political HQ Political affiliation Fertility clinic visits Family planning decisions
Location data is collected by apps (weather, maps, games, fitness trackers), mobile carriers, and the devices themselves. It is one of the most traded categories on the data broker market.
What actually helps:
- Turn off location services for apps that do not genuinely need them (most apps do not)
- Use “while using” rather than “always on” location permissions
- Disable “precise location” — approximate location is sufficient for most legitimate uses
- Turn off Wi-Fi and Bluetooth when not in use — both can be used to track location without GPS
Category 2 — Health and Biometric Data
Health data has always been sensitive. AI makes it derivable from sources that are not obviously medical.
Your purchase history, search history, app usage patterns, and location data can collectively reveal health conditions you have never disclosed anywhere. AI systems are well documented doing this inference accurately at scale.
Biometric data — fingerprints, face scans, voice prints — is categorically different from other data. If your password is compromised, you change your password. If your fingerprint data is compromised, you cannot change your fingerprints.
What actually helps:
- Do not enroll your biometrics in systems where it is optional (retail loyalty programmes, third-party apps)
- For health apps, check whether data is shared with third parties before granting access to health data
- Use symptom checkers and health search queries on a browser with a VPN if the topic is sensitive
- Review what your health and fitness apps share — most share more than users realise
Category 3 — Financial Behaviour
Bank account numbers and credit card details are protected by law in most jurisdictions. But the behavioural data around your finances — what you buy, when, where, how much you spend — is mostly unprotected and extensively collected.
AI analysis of financial behaviour patterns can reveal employment changes, relationship shifts, mental health states, addiction patterns, and financial stress with significant accuracy.
What actually helps:
- Use a credit card rather than a debit card for online purchases — limits direct bank account exposure
- Use virtual card numbers (most major banks offer these) for subscriptions and unfamiliar online retailers
- Regularly review app access to your bank account through open banking permissions
- Be aware that “buy now pay later” services often have very extensive data sharing policies
Category 4 — Communications and Relationships
Who you communicate with, how often, and at what times is metadata. The content of your messages may be encrypted. The pattern of communication is rarely protected.
AI analysis of communication metadata — who contacts whom, how frequently, at what hours — reveals relationship networks, professional hierarchies, and social dynamics without reading a single message.
What communication metadata reveals: Without reading a single message: - Who your closest relationships are (contact frequency) - When relationships start and end (communication patterns change) - Professional vs personal relationships (timing, frequency) - Stress events (communication pattern disruptions) - Network of relationships (who knows who through you) This is well documented — the NSA's PRISM programme collected "only metadata" and was described by its own analysts as more revealing than content in many cases.
What actually helps:
- Use end-to-end encrypted messaging for sensitive conversations (Signal is the gold standard)
- Understand that regular SMS and most email is not meaningfully private
- Email providers that scan content for advertising (Gmail historically) see everything in your inbox
- Be mindful that even private messages sent through social platforms may be used for ad targeting
Category 5 — Identity and Credentials
This is the category with the most direct, immediate risk — because credential theft leads to account takeover, which leads to fraud, identity theft, and cascading access to everything else.
AI has made credential attacks more efficient (as covered in the AI cyber attacks article). The protection principles are well established and genuinely effective when applied consistently.
What actually helps:
- Unique password for every account — a password manager makes this achievable
- MFA on every account that offers it — app-based (Google Authenticator, Authy) is significantly better than SMS
- Hardware security key for your most important accounts (email, banking, work accounts)
- Regular checks on HaveIBeenPwned.com — paste your email to see if it appears in known breaches

The Data Broker Problem
Most people have never heard of data brokers. They are companies whose entire business model is collecting, aggregating, and selling personal data — and they are one of the largest privacy threats that receives the least public attention.
How data brokers work:
Data broker data sources: Public records (legal, accessible): - Property records (home ownership, purchase price) - Court records (lawsuits, criminal history, bankruptcies) - Voter registration (name, address, party affiliation) - Business registrations - Marriage and divorce records Commercial data (purchased from other companies): - Retail loyalty programme purchase history - Warranty registration data - Online purchase data - Financial transaction data (from payment processors) Scraped data (from public internet): - Social media profiles and posts - Forum and review site activity - Professional profiles (LinkedIn) - News mentions Inferred data (AI-generated): - Income estimates - Health condition probabilities - Political affiliation scores - Personality profiles - Purchase intent scores Combined result: a profile that knows your name, address, employer, relatives, income estimate, health inferences, political views, purchase history, and relationship network — sold to anyone willing to pay.
Who buys this data:
- Advertisers (the primary market)
- Insurance companies (assessing risk)
- Employers (background research beyond official checks)
- Landlords (tenant screening)
- Law enforcement (without a warrant in many jurisdictions)
- Scammers (the criminal market for data broker lists is significant)
- Political campaigns (targeting and persuasion)
How to remove yourself from data brokers:
This is a significant undertaking. There are over 200 data broker companies in operation in 2026. Each has its own opt-out process. Many require you to submit a copy of your ID to “verify” your identity for removal — which itself gives them more data.
Data broker removal — practical approach: Tier 1 — High priority, opt-out yourself: Spokeo.com → spokeo.com/optout WhitePages.com → whitepages.com/suppression_requests BeenVerified.com → beenverified.com/opt-out Intelius.com → intelius.com/optout PeopleFinder.com → peoplefinders.com/manage/ Tier 2 — Use a removal service: DeleteMe, Kanary, Optery, Privacy Bee Monthly subscription ($10-$20/month) Handles ongoing removal across 100+ brokers Worth it if you are a high-profile target or concerned about stalking/harassment risk Tier 3 — GDPR/CCPA legal rights (where applicable): EU residents: GDPR Article 17 "right to erasure" California residents: CCPA opt-out rights Send formal requests using your legal name and address Companies must respond within 30-45 days Reality check: removal is not permanent. Data brokers re-acquire data regularly. Removal requires ongoing maintenance, not a one-time action.
Using AI Tools Without Giving Away Everything
AI assistants — ChatGPT, Claude, Gemini, and others — are becoming central to how people work and learn. But using them involves sending data to third-party servers. Understanding what that means helps you use them appropriately.
What happens to what you type:
When you send a message to an AI assistant:
What definitely happens:
- Your message is transmitted to and processed
on the company's servers
- The company can see the content
What varies by company and settings:
- Whether conversations are stored
- How long they are retained
- Whether they are used to train future models
- Whether humans review conversations for safety
OpenAI (ChatGPT):
- Conversations stored by default
- Can be opted out of training in settings
- Human review of some conversations for safety
Anthropic (Claude):
- Conversations stored with retention periods
- Privacy settings available
- Usage policies prohibit certain data types
Google (Gemini):
- Integrates with Google account activity
- Subject to Google's broader data policiesPractical rules for using AI tools safely:
- Do not paste documents containing other people’s personal information
- Do not share full names alongside sensitive details — use “my colleague” not their name
- Do not paste passwords, credentials, API keys, or authentication tokens
- For sensitive professional topics (legal matters, medical details, financial specifics), consider whether the specific details are necessary or whether the AI can help with a generalised version of the question
- Use the API (if available) with data retention disabled for sensitive professional use cases
- Check whether your AI tool of choice allows you to turn off conversation history — most do
PRO TIP: For genuinely sensitive professional work — legal documents, medical records, confidential business data — consider running a local open-source model like Llama 3 on your own hardware. Your data never leaves your computer. The capability is somewhat lower than frontier models, but for many professional tasks it is entirely sufficient. Ollama makes running local models accessible without deep technical knowledge.
What Your Rights Actually Are
Data protection rights vary significantly by location. Here is a practical summary:
Data rights by region (2026):
European Union — GDPR (strongest protection):
Right to access: see all data a company holds on you
Right to erasure: request deletion ("right to be forgotten")
Right to portability: receive your data in a usable format
Right to object: to processing for direct marketing
Consent required: for most data collection
Enforcement: significant fines (up to 4% of global revenue)
United Kingdom — UK GDPR (similar to EU post-Brexit):
Broadly same rights as EU GDPR
ICO (Information Commissioner's Office) enforces
California — CCPA/CPRA:
Right to know: what data is collected and sold
Right to delete: request deletion from businesses
Right to opt-out: of sale of personal information
Right to correct: inaccurate personal information
Applies to: businesses meeting size/revenue thresholds
Rest of US — fragmented:
No comprehensive federal privacy law as of 2026
State laws vary significantly
Sector-specific laws: HIPAA (health), FERPA (education),
COPPA (children) provide some protection
Pakistan — Personal Data Protection Act 2023:
Recently enacted framework
Rights to access and correction
Consent requirements for data processing
Enforcement infrastructure still developing
Practical takeaway:
EU/UK residents have the strongest enforceable rights.
Exercise them — companies must respond.
Others: rely more heavily on personal protective measures.WARNING: Privacy policies are almost never read — and companies know this. The average privacy policy takes 18 minutes to read. The average person encounters 1,462 privacy policy decisions per year. Reading all of them would take 76 work days. The practical implication: do not assume a privacy policy protects you. Assume data is collected unless you have specifically checked or limited it.
The Honest Limits of Individual Action
This article has given you practical steps. It would be incomplete without saying clearly: individual action has real limits.
The data economy is built on scale. Even if you personally opt out of every data broker, use Signal, block trackers, and use a VPN, the systemic collection of data about populations continues. Your data exists in aggregate datasets not because of individual agreements you made, but because AI inference can build profiles from data about people similar to you.
The most significant data privacy protection for individuals in the long run comes from legislation — laws that restrict what companies can collect, how long they can retain it, and what inferences they can make from it.
GDPR in Europe is the most significant example of what effective legislation can achieve. It is not perfect. But it has changed corporate behaviour at scale in ways that individual opt-outs never could.
Participating in political processes, supporting privacy-focused legislation, and choosing to work for and spend money with companies that demonstrate genuine data minimisation practices are all forms of privacy protection that work at the scale where the actual problem lives.
Individual hygiene matters. But the scope of the problem requires systemic solutions.
READ MORE: What Is Artificial Intelligence? The Ultimate Beginner’s Guide for 2026
Frequently Asked Questions
Is it possible to be completely private online in 2026?
Not practically, for most people. Complete privacy would require avoiding smartphones, using only cash, never using internet-connected services, and living in a jurisdiction with strong privacy laws. For most people, the goal is reducing unnecessary data exposure and protecting the categories that matter most — not achieving zero collection, which is not achievable while participating in modern economic and social life.
Does using a VPN protect my data from AI companies?
A VPN hides your IP address and encrypts your traffic from your internet service provider and anyone monitoring the network between you and the VPN server. It does not protect data you voluntarily provide to websites and services. If you use a VPN and then log into Google, Google still sees everything you do while logged in. VPNs are useful for preventing ISP data collection and protecting against network-level surveillance — not for protecting data you give directly to apps and services.
Are incognito or private browsing modes private?
Less than most people think. Incognito mode prevents your browser from storing your browsing history locally on your device. It does not hide your activity from your internet service provider, your employer (on work networks), the websites you visit, or Google if you are signed into a Google service. It is useful for preventing others who share your device from seeing your history. It provides minimal protection against the data collection practices this article covers.
Should I be worried about smart home devices like Alexa and Google Home?
Yes — with proportional concern. These devices listen for wake words, which requires continuously processing audio. Research has documented accidental activations where non-wake-word conversations were recorded and transmitted. The data practices of smart home devices are among the least transparent in the consumer technology space. If you use them, place them away from rooms where sensitive conversations happen. If you do not use them regularly, the privacy tradeoff is questionable.
How do I know if my data has been breached?
HaveIBeenPwned.com (haveibeenpwned.com) is a free, reputable service that indexes known data breaches and lets you check whether your email address appears in them. Enter your email and it shows you which breaches included your data and what types of information were exposed. Set up alerts so you are notified automatically when your email appears in a new breach. This is the most useful free individual data breach monitoring available.
Do AI companies use my conversations to train their models?
By default, many do — though policies vary and have changed frequently. OpenAI, Google, and Anthropic all have settings that allow you to opt out of conversation data being used for training. These settings are not always prominently displayed. Check the privacy settings of any AI tool you use regularly and decide whether you are comfortable with the default. For professional use involving sensitive data, opt out of training data collection and consider whether the tool’s privacy policy meets your organisation’s requirements.
Conclusion
You gave away more than you realised before you opened this article.
That is not an accusation — it is just how the system was designed. Default settings favour collection. Opt-out processes are buried. The value exchange — free services in return for data — was never clearly priced.
AI has raised the stakes by making that data dramatically more valuable. An AI system that can infer your health conditions from your shopping history, predict your political views from your location patterns, and assess your creditworthiness from your social connections is something qualitatively different from a retailer knowing your shoe size.
The practical response is not panic. It is prioritisation.
Protect your credentials with a password manager and MFA. Limit location permissions on apps that do not need them. Use encrypted messaging for sensitive conversations. Remove yourself from major data brokers. Understand what you are giving AI tools before pasting sensitive information.
None of these actions eliminates the risk. All of them meaningfully reduce it — and reduce it for the categories of data where exposure has the most tangible consequences for your life.
The age of AI means the data you have already given away is more powerful than it used to be. It also means you have more information than ever about how that data is being used — and more tools than ever to limit it.
Start with the steps in this article. Pick the three that apply most to your situation. Do those first.
If this guide gave you something actionable that you will actually do, share it with someone who thinks data privacy is too complicated to bother with. Leave a question in the comments — privacy questions are almost always more specific to individual situations than a general article can cover, and specific questions get specific answers.


