{"id":968,"date":"2024-06-27T07:00:38","date_gmt":"2024-06-27T07:00:38","guid":{"rendered":"https:\/\/poyesis.fr\/blogs\/?p=968"},"modified":"2025-02-03T07:54:46","modified_gmt":"2025-02-03T07:54:46","slug":"vol-de-donnees-perplexity-ai","status":"publish","type":"post","link":"https:\/\/poyesis.fr\/blogs\/vol-de-donnees-perplexity-ai\/","title":{"rendered":"Perplexity AI pris en flagrant d\u00e9lit de vol de donn\u00e9es"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Perplexity AI, une licorne qui promet de rendre Google \u201cringard\u201d (ce sont les mots exacts de son PDG), c\u2019est fait prendre en plein scrapping de donn\u00e9es.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Et ce n\u2019est pas la premi\u00e8re fois.<\/span><\/p>\n<h2><b>Qu\u2019est-ce que Perplexity AI ?<\/b><\/h2>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Si vous n\u2019\u00eates pas un abonn\u00e9 de la plan\u00e8te tech, il y a des chances que vous ne connaissiez pas encore Perplexity AI.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">C\u2019est un m\u00e9lange entre un moteur de recherche et un chatbot surboost\u00e9 \u00e0 l\u2019IA g\u00e9n\u00e9rative. Perplexity AI se distingue de ChatGPT parce qu\u2019elle fournit des r\u00e9sultats bas\u00e9s sur des donn\u00e9es en temps r\u00e9el (avec ses sources).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Et il bat Google en proposant des r\u00e9ponses condens\u00e9es et d\u00e9nu\u00e9es d\u2019hallucinations.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">La startup a \u00e9t\u00e9 cofond\u00e9e en 2022 par un ancien d\u2019Open AI, et en mars 2024 elle a r\u00e9ussi \u00e0 \u00e9lever sa capitalisation boursi\u00e8re \u00e0 1 milliard de dollars. Ce qui en fait une licorne.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Certains voient ce tout nouveau moteur de recherche comme le rempla\u00e7ant de Google. Un combat qui rappelle vaguement Google contre Firefox et Internet Explorer\u2026<\/span><\/p>\n<h2><b>Comment la supercherie a \u00e9t\u00e9 d\u00e9couverte ?<\/b><\/h2>\n<p><span style=\"font-weight: 400;\">Maintenant qu\u2019on a fait entrer l\u2019accus\u00e9, voyons ce qui lui est reproch\u00e9.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Robb Knight, d\u00e9veloppeur chez Radweb et cr\u00e9ateur du blog technologique rKnight, reproche \u00e0 Perplexity AI d\u2019ignorer les instructions des fichiers robots.txt.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Ce sont ces fichiers qui permettent aux webmasters d&rsquo;interdire aux robots des moteurs de recherche \u2014 les crawlers ou spiders \u2014 d\u2019acc\u00e9der \u00e0 certaines pages.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Or, Perplexity AI ne le respecte pas du tout, ce qui lui permet de voler des donn\u00e9es sans \u00eatre rep\u00e9r\u00e9.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Tout commence en mars 2024.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Robb Knight d\u00e9cide de bloquer Perplexity AI sur son blog.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pour y parvenir, il ajoute l\u2019agent utilisateur du moteur de Perplexity\u00a0 &#8211; Perplexity Bot -dans la liste noire de son fichier robots.txt.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Ensuite, il d\u00e9cide de v\u00e9rifier si le moteur de recherche\/chatbot IA a encore acc\u00e8s \u00e0 ses contenus.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Il lui passe l\u2019URL d\u2019un de ses articles et lui demande de le r\u00e9sumer.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Et l\u00e0\u2026<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Perplexity le lui r\u00e9sume avec tellement de d\u00e9tails que c\u2019est impossible de croire que l\u2019intelligence artificielle les a devin\u00e9s.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Robb v\u00e9rifie donc via Nginx (<\/span><a href=\"https:\/\/poyesis.fr\/blogs\/quest-ce-que-nginx-et-pourquoi-les-sites-web-en-raffolent\/\"><span style=\"font-weight: 400;\">on explique c<\/span><\/a><a href=\"https:\/\/poyesis.fr\/blogs\/quest-ce-que-nginx-et-pourquoi-les-sites-web-en-raffolent\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">e qu\u2019est Nginx ici<\/span><\/a><span style=\"font-weight: 400;\"><a href=\"https:\/\/poyesis.fr\/blogs\/quest-ce-que-nginx-et-pourquoi-les-sites-web-en-raffolent\/\" target=\"_blank\" rel=\"noopener\">) et le r\u00e9sultat est sans appel\u00a0: Perplexity Bot est bien bloqu<\/a>\u00e9.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Le 14 juin, il ordonne m\u00eame \u00e0 ses serveurs de retourner une erreur 403 lorsque les robots de Perplexity tentent d\u2019acc\u00e9der \u00e0 ses contenus.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Toujours rien.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Finalement, il va trouver l\u2019explication en regardant les fichiers logs de ses serveurs.<\/span><\/p>\n<p><b>Perplexity AI ment depuis le d\u00e9but sur l\u2019agent utilisateur de son crawler<\/b><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Le moteur de recherche dissimule ses pages via un user agent commun. Celui g\u00e9n\u00e9ralement associ\u00e9 \u00e0 Google Chrome sur Windows 10.<\/span><\/p>\n<p><a href=\"https:\/\/rknight.me\/blog\/perplexity-ai-is-lying-about-its-user-agent\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Robb Knight raconte tout \u00e7a dans son billet de blog (en anglais)<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Il a fait la m\u00eame chose sur le site MacStories et le r\u00e9sultat a \u00e9t\u00e9 le m\u00eame.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Et il n\u2019est pas le seul \u00e0 l\u2019avoir remarqu\u00e9\u2026<\/span><\/p>\n<h2><b>Forbes a aussi d\u00e9tect\u00e9 le scrapping ill\u00e9gal de Perplexity AI et sort la hache de guerre<\/b><\/h2>\n<figure id=\"attachment_970\" aria-describedby=\"caption-attachment-970\" style=\"width: 1600px\" class=\"wp-caption aligncenter\"><img fetchpriority=\"high\" decoding=\"async\" class=\"wp-image-970\" src=\"https:\/\/poyesis.fr\/blogs\/wp-content\/uploads\/2024\/06\/Homme-tenant-un-magazine-Forbes.jpg\" alt=\"Homme-tenant-un-magazine-Forbes\" width=\"1600\" height=\"1999\" srcset=\"https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Homme-tenant-un-magazine-Forbes.jpg 1600w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Homme-tenant-un-magazine-Forbes-240x300.jpg 240w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Homme-tenant-un-magazine-Forbes-820x1024.jpg 820w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Homme-tenant-un-magazine-Forbes-768x960.jpg 768w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Homme-tenant-un-magazine-Forbes-1229x1536.jpg 1229w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Homme-tenant-un-magazine-Forbes-750x937.jpg 750w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Homme-tenant-un-magazine-Forbes-1140x1424.jpg 1140w\" sizes=\"(max-width: 1600px) 100vw, 1600px\" \/><figcaption id=\"caption-attachment-970\" class=\"wp-caption-text\">Homme-tenant-un-magazine-Forbes<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">Randall Lane, directeur du contenu de Forbes Media a lanc\u00e9 l\u2019alerte le 11 juin 2024.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Dans son article \u201c<\/span><a href=\"https:\/\/www.forbes.com\/sites\/randalllane\/2024\/06\/11\/why-perplexitys-cynical-theft-represents-everything-that-could-go-wrong-with-ai\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Pourquoi le vol cynique de Perplexity repr\u00e9sente tout ce qui pourrait mal tourner avec l&rsquo;IA<\/span><\/a><span style=\"font-weight: 400;\">\u201d (au moins le titre est clair sur ses sentiments envers Perplexity), il d\u00e9clare ceci :<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00ab <\/span><i><span style=\"font-weight: 400;\">L&rsquo;IA ne vaut que ce que valent ceux qui la supervisent. Je suis un adepte de l&rsquo;IA et, entre de bonnes mains, la productivit\u00e9, les progr\u00e8s et la prosp\u00e9rit\u00e9 sont au rendez-vous. <\/span><\/i><i><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/i><i><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/i><i><span style=\"font-weight: 400;\">Mais entre les mains de personnes comme Aravind Srinivas, PDG de Perplexity AI, qui a la r\u00e9putation d&rsquo;\u00eatre dou\u00e9 pour les techniques de doctorat et moins dou\u00e9 pour les aspects humains fondamentaux, l&rsquo;amoralit\u00e9 pose un risque existentiel<\/span><\/i><span style=\"font-weight: 400;\"> \u00bb.<\/span><\/p>\n<figure id=\"attachment_972\" aria-describedby=\"caption-attachment-972\" style=\"width: 1280px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" class=\"size-full wp-image-972\" src=\"https:\/\/poyesis.fr\/blogs\/wp-content\/uploads\/2024\/06\/Aravind-Srinivas-PDG-de-Perplexity-AI.jpg\" alt=\"Aravind Srinivas PDG de Perplexity AI\" width=\"1280\" height=\"720\" srcset=\"https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Aravind-Srinivas-PDG-de-Perplexity-AI.jpg 1280w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Aravind-Srinivas-PDG-de-Perplexity-AI-300x169.jpg 300w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Aravind-Srinivas-PDG-de-Perplexity-AI-1024x576.jpg 1024w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Aravind-Srinivas-PDG-de-Perplexity-AI-768x432.jpg 768w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Aravind-Srinivas-PDG-de-Perplexity-AI-750x422.jpg 750w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Aravind-Srinivas-PDG-de-Perplexity-AI-1140x641.jpg 1140w\" sizes=\"(max-width: 1280px) 100vw, 1280px\" \/><figcaption id=\"caption-attachment-972\" class=\"wp-caption-text\">Aravind Srinivas PDG de Perplexity AI<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">C\u2019est que Forbes aussi, a remarqu\u00e9 le vol de contenu de Perplexity AI.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Et ils n\u2019appr\u00e9cient pas du tout.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Non seulement, tous les contenus (payants et exclusifs) de Forbes sont accessibles via Perplexity, mais la firme ne les cite m\u00eame pas.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">De son c\u00f4t\u00e9, Aravind Srinivas, <\/span><a href=\"https:\/\/x.com\/AravSrinivas\/status\/1799159732126794017\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">CEO de Perplexity AI a tent\u00e9 de d\u00e9fendre les pratiques de son entreprise sur X<\/span><\/a><span style=\"font-weight: 400;\">. Il a d\u00e9clar\u00e9 que le probl\u00e8me vient d\u2019une nouvelle fonctionnalit\u00e9 \u201cPerplexity Pages\u201d lanc\u00e9e il y a 2 semaines.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">(Tout en taclant au passage ses concurrents &#8211; ChatGPT, Gemini et Copilot.)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00c7a n\u2019a pas r\u00e9ussi \u00e0 satisfaire <\/span><a href=\"https:\/\/www.axios.com\/2024\/06\/18\/forbes-perplexity-ai-legal-action-copyright\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Forbes qui a intent\u00e9 une action en justice contre Perplexity AI le 18 juin 2024<\/span><\/a><span style=\"font-weight: 400;\">.<br \/>\n<\/span><\/p>\n<h2><b>Pourquoi les pratiques discutables de Perplexity AI sont probl\u00e9matiques ?<\/b><\/h2>\n<figure id=\"attachment_971\" aria-describedby=\"caption-attachment-971\" style=\"width: 1592px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" class=\"size-full wp-image-971\" src=\"https:\/\/poyesis.fr\/blogs\/wp-content\/uploads\/2024\/06\/Hacker.jpg\" alt=\"Hacker\" width=\"1592\" height=\"1999\" srcset=\"https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Hacker.jpg 1592w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Hacker-239x300.jpg 239w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Hacker-816x1024.jpg 816w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Hacker-768x964.jpg 768w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Hacker-1223x1536.jpg 1223w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Hacker-750x942.jpg 750w, https:\/\/poyesis.fr\/wp-content\/uploads\/2024\/06\/Hacker-1140x1431.jpg 1140w\" sizes=\"(max-width: 1592px) 100vw, 1592px\" \/><figcaption id=\"caption-attachment-971\" class=\"wp-caption-text\">Hacker<\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">D\u00e9j\u00e0, il y a une notion qui para\u00eet floue et abstraite pour les ing\u00e9nieurs de Perplexity AI qui s\u2019appelle \u201crespect de la propri\u00e9t\u00e9 intellectuelle\u201d.<\/span><\/p>\n<p>(On a d\u00e9j\u00e0 parl\u00e9 du casse t\u00eate de la <a href=\"https:\/\/poyesis.fr\/blogs\/qui-detient-la-propriete-intellectuelle-de-votre-site-web\/\" target=\"_blank\" rel=\"noopener\">propri\u00e9t\u00e9 intellectuelle pour les sites web<\/a> et de celui de vos <a href=\"https:\/\/poyesis.fr\/blogs\/d\/\" target=\"_blank\" rel=\"noopener\">codes sources<\/a>)<\/p>\n<p><span style=\"font-weight: 400;\">En plus de ne pas respecter ce concept, voler des contenus et se les approprier \u00e0 des r\u00e9percussions graves :<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">\u00e7a prive les cr\u00e9ateurs de contenus de leurs sources de revenus (c\u2019est ce qui s\u2019est pass\u00e9 quand Forbes a retrouv\u00e9 ses histoires exclusives sur Perplexity) ;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">le trafic vers les sites web sources baissent.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Pour les \u00e9diteurs et les entreprises journalistes, c\u2019est une attaque \u00e0 leur business model.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Par exemple, voici la<a href=\"https:\/\/www.lemonde.fr\/le-monde-et-vous\/article\/2021\/01\/26\/les-revenus-du-monde-des-sources-diversifiees_6067680_6065879.html\" target=\"_blank\" rel=\"noopener\">\u00a0r\u00e9partition du chiffre d\u2019affaires du journal \u201cLe Monde\u201d en 2022<\/a> :<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Abonnements num\u00e9riques et papiers : 48 %<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Vente au num\u00e9ro : 20 %<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Publicit\u00e9 : 23 %<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Diversification : 7 %<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Subventions publiques et priv\u00e9es : 2 %<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">En publiant leurs contenus, les IA telles que Perplexity AI coupent la plus grande source de financement des journaux.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Histoire \u00e0 suivre donc\u2026<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Perplexity AI, une licorne qui promet de rendre Google \u201cringard\u201d (ce sont les mots exacts de son PDG), c\u2019est fait prendre en plein scrapping de donn\u00e9es. Et ce n\u2019est pas la premi\u00e8re fois. Qu\u2019est-ce que Perplexity AI ? &nbsp; Si vous n\u2019\u00eates pas un abonn\u00e9 de la plan\u00e8te tech, il y a des chances que [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":969,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":{"format":"standard","override":[{"template":"7","single_blog_custom":"553","parallax":"1","fullscreen":"1","layout":"no-sidebar-narrow","sidebar":"default-sidebar","second_sidebar":"default-sidebar","sticky_sidebar":"1","share_position":"floatbottom","share_float_style":"share-normal","show_share_counter":"1","show_view_counter":"1","show_featured":"1","show_post_meta":"1","show_post_author":"1","show_post_author_image":"1","show_post_date":"1","post_date_format":"default","post_date_format_custom":"Y\/m\/d","show_post_category":"1","show_post_reading_time":"1","post_reading_time_wpm":"300","post_calculate_word_method":"str_word_count","show_zoom_button":"0","zoom_button_out_step":"2","zoom_button_in_step":"3","show_post_tag":"1","show_prev_next_post":"1","show_popup_post":"1","number_popup_post":"1","show_author_box":"1","show_post_related":"1","show_inline_post_related":"1"}],"image_override":[{"single_post_thumbnail_size":"crop-500","single_post_gallery_size":"crop-500"}],"trending_post_position":"meta","trending_post_label":"Trending","sponsored_post_label":"Sponsored by","disable_ad":"0","subtitle":""},"jnews_primary_category":[],"jnews_override_bookmark_settings":{"override_bookmark_button":"0","override_show_bookmark_button":"0"},"jnews_override_counter":{"view_counter_number":"0","share_counter_number":"0","like_counter_number":"0","dislike_counter_number":"0"},"footnotes":""},"categories":[113],"tags":[239,240,67,241,242,243],"class_list":["post-968","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-actualite","tag-ai","tag-donnees","tag-intelligence-artificielle","tag-perplexity","tag-scrapping","tag-vol"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/posts\/968","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/comments?post=968"}],"version-history":[{"count":1,"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/posts\/968\/revisions"}],"predecessor-version":[{"id":1177,"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/posts\/968\/revisions\/1177"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/media\/969"}],"wp:attachment":[{"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/media?parent=968"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/categories?post=968"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/poyesis.fr\/blogs\/wp-json\/wp\/v2\/tags?post=968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}