TechForge

19th June 2025

Companies placing any form of content online have little choice but to allow AI scraping robots access to their media – text, images, video. That seems to be the accepted norm in 2025, and it’s a situation that the vast majority of digital marketers have accepted.

Answers to questions like, “Who’s offering the best price on women’s trouser suits?” is likely today to be answered by an AI like Gemini, positioned above the search engine’s ‘traditional’ list of retailers’ pages that the user can manually piece their way through. Instead of howling with protest at any perceived unfairness of the situation, companies are working hard to optimise their content for better representation in AI-derived answers given by the big AI companies.

Blocking AI bots

But not all companies choose to roll with the punches and accept the ‘new normal.’ In a recent interview with Business Insider, CEO of Nextdoor, Nirav Tolia, recounts that not only does his site, nextdoor.com, block AI bots from scraping its content, but he built the company even without optimising his site for Google Search. To this day, he says, “we’ve never allowed our content to be scraped, to be distributed – we aren’t crawled by any of the search engines.”

That’s an extreme view, and a decision taken by Tolia on what he describes as ideological grounds. As he admits, refusal to allow AIs access to the community-generated content on Nextdoor means it’s incumbent on him and his team to give users the same level of customer experience that ChatGPT search users receive, but without using ChatGPT-type technology.

But Nirav Tolia’s approach is not as extreme as other content creators’ responses to the big AI engines scraping the internet for both learning materials and up-to-date content to produce in response to queries.

Sue the AI companies

There are several lawsuits in progress at present where creators and artists have decided to sue, and are taking AI companies through the courts, alleging copyright infringement of their works. Most recently of the extant dozens of cases, a group of artists has filed against Stability AI, Midjourney, Runway AI, and DeviantArt, alleging misuse of their works to train AI image-generation models.

In the biggest case of recent months and at the other end of the scale, NBCUniversal has filed against Midjourney, the AI image generation company, claiming the technology can create unauthorised images based on copyright images, such as those from the Disney pantheon and the Star Wars world.

In several courtroom and judicial battles of the past, big AI execs have stated repeatedly that scraping others’ content falls under the ‘fair use’ clause of copyright laws, implying that materials found online can be regarded as being in the public domain, as those entities gathering the materials (the AIs) don’t offer them back up in the form of imitative creations.

The only difference between legal cases brought by global multinational companies like NBCUniversal, and those smaller scale class actions enacted by the likes of Karla Ortiz, is money. Disney and LucasArts stand, potentially, to lose more than one-person-band artists in pure monetary terms (although a million dollars to the latter means more to the latter), and the big Hollywood studios have the funds to pay more lawyers for longer.

But intellectually, and in the eyes of the law, there is little difference between the smaller class actions and any legal juggernaut driven by big Hollywood studios. It will probably take a decade or more for many legal cases to reach judgement, and until then, AI bots will continue their quiet work.

Block, please?

The third path that some content creators choose to take is to ask the AI companies not to scrape their online properties. Experience has shown that companies like OpenAI and Anthropic ignore the decades-old method of limiting automated external access to websites by means of a line of text in a robots.txt file. Sitting at the root level of websites, a typical robots.txt file looks something like this:

User-agent: *
Disallow: /admin/

User-agent: GPTBot
Disallow: /

User-agent: Googlebot
Disallow:

User-agent: *
Disallow:

Designed to be human-readable, the above example tells any web scraper that it’s OK to read the contents of the site (apart from the admin section) for indexing by search engines (the Googlebot), but not OK for an AI bot (GPTBot, in this case) to do the same. It’s a ‘gentleman’s agreement’ system that relies on good faith, and can be ignored. In most cases, it is indeed ignored.

So what can companies protective of their intellectual property do? To be like Nextdoor, companies need significant technology resources to effectively block access by third-parties, and typically, it’s an ongoing process that has to be constantly updated and tweaked, in ways analogous to cybersecurity measures that play a cat-and-mouse game with bad actors. In this case, however, the so-called bad actors are the evolving bots reading the web for content, and the defenders work for companies like Nextdoor, erecting barriers and firewalls to prevent access.

Anyone using the web these days is growing accustomed to CAPTCHA checks, ranging from image identification to the common “click to show you’re not a robot” prompts. These can, theoretically, be used to prevent bots accessing a website, but they are so common and well-known that most sophisticated bots can find their way around them.

Block with extreme prejudice

Several thousand websites have deployed more technical tools to head AI bots off at the pass. By asking the device (be it a graphical web browser or automated AI scraping bot) to solve a puzzle written in the JavaScript programming language, visitors can be allowed or denied access (typical scraping bots don’t have JavaScript capabilities). As you might expect, the majority of sites so protected hold highly technical content written by technology experts; individuals and organisations with the knowledge to implement such a gateway.

Partner

The final option for content creators is only really available for content creators that attract enormous audiences for their work, such as the New York Times, or the owners of social media site, Reddit. These are the entities with enough content ‘muscle’ to reach private agreements with the AI companies. Under such partner agreements, companies like Google, OpenAI, or X are given levels of privileged access to data repositories that in some cases represent decades of content.

In return, the publisher gets a fee, which becomes part of the way they can monetise their content. For news sites, such agreements are helping bridge the gap between paper and digital media – a gap that many publications have struggled to cross in the last 20 years.

Adapt

Most marketing professionals have little choice but to embark on a new learning journey, revising their older ideas of search engine optimisation, and adopting new strategies to ensure their messages are placed in front of the right people. There are, ironically, a good number of AI-powered tools out there to help professionals adapt their content so it can thrive at a time when AIs often proxy content between creator and consumer.

By adapting to the changing paradigm, marketers will have to find new ways to create impactful content. It pays to learn about how AIs work, and the different ways large companies present messages in AI-generated environments. By arming themselves with specialist knowledge of how LLMs and multi-modal AIs work, companies can differentiate themselves from their competitors. If marketers work on the safe assumption that colleagues working elsewhere have access to the same tools they do, using a greater insight into their internal mechanisms will create marketing methods that are a cut above the rest.

Conclusions

The choices for content creators are block, adapt, sue, or partner. For smaller businesses looking to make an impression in the digital environment, the only real choice is to adapt. Blocking, suing, or partnering are either out of most organisations’ pay grade, or too technically challenging to implement easily.

(Image source: “We make a good team…” by Ed Yourdon is licensed under CC BY-NC-SA 2.0.)

See also: Meta, TikTok face half-billion dollar legal challenge in Brazil over minor protection

Attending the 2025 Digital Marketing World Forum in London? Visit us at Booth 238 or catch one of our creator economy speaking sessions.

Or, you can head to the website to register and join us on 24-25 June at Olympia, London.

About the Author

joe@techforge.pub

Joe Green is a writer based in Bristol, UK. He acquired his first computer and dial-up modem in 1992 and has worked in the tech industry since 2000. He writes and podcasts, specialising in open-source, networking, cybersecurity, software development, and online privacy.

Related

Join our Community

Subscribe now to get all our premium content and latest tech news delivered straight to your inbox

Popular

Subscribe

All our premium content and latest tech news delivered straight to your inbox

This field is for validation purposes and should be left unchanged.