{"id":680,"date":"2024-08-17T14:09:42","date_gmt":"2024-08-17T12:09:42","guid":{"rendered":"https:\/\/www.nicedata.fr\/?p=680"},"modified":"2024-10-07T15:07:27","modified_gmt":"2024-10-07T13:07:27","slug":"notre-premier-notebook-spark-dans-synapse","status":"publish","type":"post","link":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/","title":{"rendered":"Notre premier notebook Spark dans Synapse"},"content":{"rendered":"\n<p>Synapse nous permet d&rsquo;utiliser Apache Spark en tant que Runtime de processing afin de travailler nos diff\u00e9rents datasets. Dans cet article, nous allons cr\u00e9er notre premier notebook Spark (et le pool Spark n\u00e9cessaire) pour transformer nos premi\u00e8res donn\u00e9es de la zone bronze pour commencer \u00e0 alimenter notre zone silver.<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_81 ez-toc-wrap-center counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Sommaire<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#Introduction_et_pool_Spark\" >Introduction et pool Spark<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#Creation_dun_notebook\" >Cr\u00e9ation d&rsquo;un notebook<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#Demarrage_dune_session_Spark\" >D\u00e9marrage d&rsquo;une session Spark<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#Transformation_de_donnee\" >Transformation de donn\u00e9e<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#Utilisation_dun_Notebook\" >Utilisation d&rsquo;un Notebook<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#Imports_et_declarations\" >Imports et d\u00e9clarations<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#La_transformation\" >La transformation !<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#Nos_donnees_en_zone_Silver\" >Nos donn\u00e9es en zone Silver<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#Le_Notebook_final\" >Le Notebook final<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Introduction_et_pool_Spark\"><\/span>Introduction et pool Spark<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Un Notebook Spark a besoin d&rsquo;un cluster Spark pour faire tourner le code \u00e9crit dans celui-ci. Dans Synapse, cela se mat\u00e9rialise par un \u00ab\u00a0Spark pool\u00a0\u00bb qui contiendra la d\u00e9finition de notre cluster Spark et qui porte la facturation du processing Spark. Nous n&rsquo;avons pour l&rsquo;instant jamais utilis\u00e9 Spark et nous allons dans un premier temps devoir cr\u00e9er ce \u00ab\u00a0Pool\u00a0\u00bb.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Nous allons ici configurer un cluster Spark de petite taille, mais libre \u00e0 vous de cr\u00e9er un cluster plus \u00ab\u00a0gros\u00a0\u00bb. Cependant, attention \u00e0 la facture !<\/p>\n<\/blockquote>\n\n\n\n<p>Tout se passe dans la partie management de Synapse ou nous allons faire un \u00ab\u00a0+ New\u00a0\u00bb Apache Spark pool.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"654\" height=\"556\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-25.png?resize=654%2C556&#038;ssl=1\" alt=\"\" class=\"wp-image-681\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-25.png?w=654&amp;ssl=1 654w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-25.png?resize=300%2C255&amp;ssl=1 300w\" sizes=\"auto, (max-width: 654px) 100vw, 654px\" \/><figcaption class=\"wp-element-caption\">Cr\u00e9ation d&rsquo;un pool Spark<\/figcaption><\/figure>\n\n\n\n<p>La premi\u00e8re section de configuration de notre pool nous propose la configuration basique de celui-ci. Nous allons lui fournir :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Un nom<\/li>\n\n\n\n<li>Une taille de n\u0153ud (un cluster est compos\u00e9 de plusieurs unit\u00e9s appel\u00e9es n\u0153ud) : commen\u00e7ons par un \u00ab\u00a0petit \/ small\u00a0\u00bb, il sera toujours tant de modifier ce param\u00e8tre plus tard si besoin<\/li>\n\n\n\n<li>La possibilit\u00e9 d&rsquo; \u00ab\u00a0autoscale\u00a0\u00bb : Si on autorise (ou non), notre cluster \u00e0 augmenter ou r\u00e9duire le nombre de n\u0153uds de notre Cluster de fa\u00e7on automatique.<\/li>\n\n\n\n<li>Nombre de n\u0153uds : Si on a autoris\u00e9 notre cluster \u00e0 changer son nombre de n\u0153uds automatiquement, nous allons ici lui poser des limites (hautes et basses)<\/li>\n\n\n\n<li>Allocation dynamique d&rsquo;ex\u00e9cuteur (l&rsquo;ex\u00e9cuteur est un n\u0153ud particulier qui \u00ab\u00a0ex\u00e9cute\u00a0\u00bb du code) : Lors de l&rsquo;ex\u00e9cution d&rsquo;une application Spark, le cluster pourra choisir automatiquement le nombre d&rsquo;ex\u00e9cuteurs n\u00e9cessaire et les ajouter \u00e0 la vol\u00e9e si l&rsquo;on active ce param\u00e8tre.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"742\" height=\"898\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-26.png?resize=742%2C898&#038;ssl=1\" alt=\"\" class=\"wp-image-682\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-26.png?w=742&amp;ssl=1 742w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-26.png?resize=248%2C300&amp;ssl=1 248w\" sizes=\"auto, (max-width: 742px) 100vw, 742px\" \/><figcaption class=\"wp-element-caption\">Param\u00e9trage de notre pool Spark<\/figcaption><\/figure>\n\n\n\n<p>Il est possible de choisir une \u00ab\u00a0taille maximale\u00a0\u00bb des n\u0153uds du cluster. On parle ici des vCores et de la RAM disponible pour chaque n\u0153ud du cluster. Ceci est tr\u00e8s lin\u00e9aire, et s&rsquo;il impacte les performances de nos processus, il impacte aussi grandement le cout \u00e0 l&rsquo;heure d&rsquo;utilisation ! attention donc \u00e0 ne pas \u00eatre trop gourmand !<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"748\" height=\"211\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-27.png?resize=748%2C211&#038;ssl=1\" alt=\"\" class=\"wp-image-683\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-27.png?w=748&amp;ssl=1 748w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-27.png?resize=300%2C85&amp;ssl=1 300w\" sizes=\"auto, (max-width: 748px) 100vw, 748px\" \/><figcaption class=\"wp-element-caption\">Les diff\u00e9rentes tailles de n\u0153ud Spark<\/figcaption><\/figure>\n\n\n\n<p>Pour les param\u00e8tres additionnels, nous allons nous concentrer sur deux principaux \u00e9l\u00e9ments que sont la mise en pause automatique et la version de Spark.<\/p>\n\n\n\n<p>Concernant la mise en pause automatique, cela permet \u00e0 notre cluster Spark de s&rsquo;\u00e9teindre lorsqu&rsquo;il n&rsquo;est pas utilis\u00e9 pendant un certain laps de temps. Ce param\u00e8tre est particuli\u00e8rement int\u00e9ressant pour ne pas \u00ab\u00a0payer\u00a0\u00bb lorsqu&rsquo;il n&rsquo;y a pas de traitement en cours. Il faut cependant garder en t\u00eate qu&rsquo;un cluster Spark ne d\u00e9marre pas de fa\u00e7on instantan\u00e9. Si nous autorisons celui-ci \u00e0 s&rsquo;\u00e9teindre, il faudra prendre en consid\u00e9ration que pour les prochaines ex\u00e9cutions, il faudra qu&rsquo;il se red\u00e9marre. Nous avons donc l\u00e0 un choix \u00e0 faire entre r\u00e9activit\u00e9 et co\u00fbt. Nous allons dans notre configuration autoriser notre cluster \u00e0 s&rsquo;\u00e9teindre apr\u00e8s 15 minutes d&rsquo;inactivit\u00e9 (Number of minutes idle).<\/p>\n\n\n\n<p>Nous avons ensuite le choix de la version de Spark. Uniquement les versions support\u00e9es par Microsoft dans Synapse sont disponibles et il faut bien comprendre qu&rsquo;il ne sera pas possible de changer la version de notre cluster plus tard. Nous pourrons simplement en cr\u00e9er un nouveau avec une autre version si besoin. L\u00e0 encore, le choix peut s&rsquo;av\u00e9rer critique selon nos besoins mais dans mon cas, je vais utiliser la derni\u00e8re version disponible \u00e0 date.<\/p>\n\n\n\n<p>Il y a d&rsquo;autres param\u00e8tres additionnels que l&rsquo;on peut configurer pour notre pool Spark. Nous ne nous y int\u00e9resserons pas dans cet article qui n&rsquo;a pas la volont\u00e9 de rentrer dans le d\u00e9tail de Spark.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"773\" height=\"1024\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-28.png?resize=773%2C1024&#038;ssl=1\" alt=\"\" class=\"wp-image-684\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-28.png?resize=773%2C1024&amp;ssl=1 773w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-28.png?resize=226%2C300&amp;ssl=1 226w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-28.png?resize=768%2C1018&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-28.png?w=784&amp;ssl=1 784w\" sizes=\"auto, (max-width: 773px) 100vw, 773px\" \/><figcaption class=\"wp-element-caption\">Configuration avanc\u00e9e du pool Spark<\/figcaption><\/figure>\n\n\n\n<p>Une fois cr\u00e9\u00e9, notre pool sera visible dans l&rsquo;interface de management (et de monitoring) de nos pools Spark. Suivant les besoins de notre plateforme, nous pourrions avoir plusieurs clusters Spark.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"340\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-29.png?resize=1024%2C340&#038;ssl=1\" alt=\"\" class=\"wp-image-685\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-29.png?resize=1024%2C340&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-29.png?resize=300%2C100&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-29.png?resize=768%2C255&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-29.png?w=1215&amp;ssl=1 1215w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption class=\"wp-element-caption\">Pool Spark disponible<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Creation_dun_notebook\"><\/span>Cr\u00e9ation d&rsquo;un notebook<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Le moyen le plus \u00ab\u00a0basique\u00a0\u00bb de cr\u00e9er un notebook Spark est d&rsquo;utiliser la capacit\u00e9 de Synapse de g\u00e9n\u00e9rer automatiquement un notebook \u00e0 partir de notre onglet de donn\u00e9e.<\/p>\n\n\n\n<p>Pour ce faire il y a un certain nombre de clic \u00e0 faire, mais qui sont relativement \u00e9vidents. Il nous suffit de naviguer jusqu&rsquo;au fichier de donn\u00e9es que nous cherchons \u00e0 lire depuis nos donn\u00e9es li\u00e9es. De faire un clic droit sur celui-ci et de choisir de cr\u00e9er un nouveau Notebook qui chargera celui-ci dans un DataFrame.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"543\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/10\/image.png?resize=1024%2C543&#038;ssl=1\" alt=\"\" class=\"wp-image-715\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/10\/image.png?resize=1024%2C543&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/10\/image.png?resize=300%2C159&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/10\/image.png?resize=768%2C407&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/10\/image.png?w=1102&amp;ssl=1 1102w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption class=\"wp-element-caption\">G\u00e9n\u00e9ration automatique d&rsquo;un Notebook Spark<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1005\" height=\"321\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-31.png?resize=1005%2C321&#038;ssl=1\" alt=\"\" class=\"wp-image-689\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-31.png?w=1005&amp;ssl=1 1005w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-31.png?resize=300%2C96&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-31.png?resize=768%2C245&amp;ssl=1 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption class=\"wp-element-caption\">Notebook auto-g\u00e9n\u00e9r\u00e9 par Synapse<\/figcaption><\/figure>\n\n\n\n<p>La premi\u00e8re chose qui nous saute aux yeux est le petit message d&rsquo;information qui nous demande de s\u00e9lectionner un pool Spark \u00e0 attacher \u00e0 notre Notebook !<\/p>\n\n\n\n<p>En effet, comme \u00e9voqu\u00e9, afin d&rsquo;ex\u00e9cuter du code, nous avons besoin de savoir \u00ab\u00a0o\u00f9\u00a0\u00bb l&rsquo;ex\u00e9cuter. Et cela tombe bien, nous venons de cr\u00e9er un pool Spark !<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"166\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-32.png?resize=1024%2C166&#038;ssl=1\" alt=\"\" class=\"wp-image-690\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-32.png?resize=1024%2C166&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-32.png?resize=300%2C49&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-32.png?resize=768%2C125&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-32.png?w=1042&amp;ssl=1 1042w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption class=\"wp-element-caption\">S\u00e9lection du pool Spark \u00e0 attacher \u00e0 notre Notebook<\/figcaption><\/figure>\n\n\n\n<p>Cette action nous d\u00e9montre qu&rsquo;il est possible de cr\u00e9er plusieurs pool Spark et que Synapse ne peut d\u00e9cider pour nous lequel utiliser. Les raisons peuvent \u00eatre nombreuses allant de besoins de versions sp\u00e9cifiques, de tailles de n\u0153ud sp\u00e9cifique ou encore la s\u00e9paration des ressources de certains processus voir la facturation.<\/p>\n\n\n\n<p>Une foi s\u00e9lectionn\u00e9, nous pouvons d\u00e9j\u00e0 ex\u00e9cuter le code g\u00e9n\u00e9r\u00e9 par Synapse pour avoir un premier aper\u00e7u de ce qui a \u00e9t\u00e9 fait et si tout fonctionne bien ! Pour ce faire il suffit d&rsquo;ex\u00e9cuter la cellule de code qui a \u00e9t\u00e9 g\u00e9n\u00e9r\u00e9 pour nous en cliquant sur le bouton \u00ab\u00a0play\u00a0\u00bb (ou encore \u00ab\u00a0Run all\u00a0\u00bb<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"223\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-36.png?resize=1024%2C223&#038;ssl=1\" alt=\"\" class=\"wp-image-694\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-36.png?resize=1024%2C223&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-36.png?resize=300%2C65&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-36.png?resize=768%2C167&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-36.png?w=1273&amp;ssl=1 1273w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Demarrage_dune_session_Spark\"><\/span>D\u00e9marrage d&rsquo;une session Spark<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Apr\u00e8s avoir demand\u00e9 l&rsquo;ex\u00e9cution de notre script, nous allons rentrer dans le monde merveilleux de Spark et de ses sessions. Nous remarquons que nous allons attendre un peu le temps que notre session (et cluster) d\u00e9marrent.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"227\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-35.png?resize=1024%2C227&#038;ssl=1\" alt=\"\" class=\"wp-image-693\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-35.png?resize=1024%2C227&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-35.png?resize=300%2C66&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-35.png?resize=768%2C170&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-35.png?w=1287&amp;ssl=1 1287w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption class=\"wp-element-caption\">Session Spark en train de d\u00e9marrer<\/figcaption><\/figure>\n\n\n\n<p>Apr\u00e8s quelques minutes d&rsquo;attentes, nous avons notre r\u00e9sultat et notre notebook nous indique les temps de d\u00e9marrage de session et d&rsquo;ex\u00e9cution de notre code. <\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"520\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-37.png?resize=1024%2C520&#038;ssl=1\" alt=\"\" class=\"wp-image-695\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-37.png?resize=1024%2C520&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-37.png?resize=300%2C152&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-37.png?resize=768%2C390&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-37.png?resize=1536%2C780&amp;ssl=1 1536w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-37.png?w=1540&amp;ssl=1 1540w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption class=\"wp-element-caption\">R\u00e9sultat de notre ex\u00e9cution<\/figcaption><\/figure>\n\n\n\n<p>Ce qu&rsquo;il est important de noter c&rsquo;est que si notre 1\u00e8re ex\u00e9cution a \u00e9t\u00e9 relativement longue due au d\u00e9marrage de la session, si l&rsquo;on ex\u00e9cute une deuxi\u00e8me fois notre cellule, celle-ci s&rsquo;ex\u00e9cutera nettement plus vite, principalement par le fait que notre session restera en vie pendant un certain temps. Tant que notre notebook nous \u00e9crit \u00ab\u00a0Ready\u00a0\u00bb en haut \u00e0 gauche, cela indique que notre session est pr\u00eate et qu&rsquo;il n&rsquo;est donc pas n\u00e9cessaire de recr\u00e9er celle-ci ! Dans le cas pr\u00e9sent, nous sommes pass\u00e9s de 3 minutes 27secondes \u00e0 seulement 3 secondes. (il y a surement un peu de cache derri\u00e8re, mais globalement, nous n&rsquo;avons plus les 3 minutes et 6 secondes de d\u00e9marrage de session !!<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1007\" height=\"488\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-38.png?resize=1007%2C488&#038;ssl=1\" alt=\"\" class=\"wp-image-696\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-38.png?w=1007&amp;ssl=1 1007w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-38.png?resize=300%2C145&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-38.png?resize=768%2C372&amp;ssl=1 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Transformation_de_donnee\"><\/span>Transformation de donn\u00e9e<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Une fois la bonne ex\u00e9cution et notre session Spark d\u00e9marr\u00e9es, nous pouvons remarquer que notre notebook nous affiche une table donn\u00e9e correspondant aux donn\u00e9es lues dans notre fichier d&rsquo;origine. Nous allons donc pouvoir commencer \u00e0 \u00e9crire notre code Spark pour transformer nos donn\u00e9es !<\/p>\n\n\n\n<p>Pour cette partie de \u00ab\u00a0transformation\u00a0\u00bb, nous allons rester tr\u00e8s soft et nous allons simplement construire une vision globale d&rsquo;une ligne de commande en faisant une jointure entre les lignes et les ent\u00eates. Evidemment cette \u00e9tape d\u00e9pendra du besoin business derri\u00e8re vos travaux et peut-\u00eatre que des traitements plus complexes seront n\u00e9cessaire.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Utilisation_dun_Notebook\"><\/span>Utilisation d&rsquo;un Notebook<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Si vous n&rsquo;\u00eates pas familier des notebooks, il peut \u00eatre un peu d\u00e9routant d&rsquo;arriver sur ce type d&rsquo;interface qui sera tr\u00e8s diff\u00e9rent d&rsquo;un simple code source dans un fichier \u00ab\u00a0.py\u00a0\u00bb. En effet, un notebook peut \u00eatre consid\u00e9r\u00e9 comme une sorte de cahier de TP ou nous allons \u00e9crire du code, mais aussi des annotations ainsi que des r\u00e9sultats.<\/p>\n\n\n\n<p>Nos diff\u00e9rents blocs de code ou d&rsquo;annotation sont appelle des \u00ab\u00a0cellules\u00a0\u00bb que nous pourrons d\u00e9finir comme du code ou du Markdown (pour nos annotations). Pour ajouter une cellule, il faut placer la sourie juste en dessous d&rsquo;une cellule existante pour faire apparaitre deux petits + qui nous donne le choix d&rsquo;ajouter soit une cellule de Code soit de Markdown.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"268\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-40.png?resize=1024%2C268&#038;ssl=1\" alt=\"\" class=\"wp-image-698\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-40.png?resize=1024%2C268&amp;ssl=1 1024w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-40.png?resize=300%2C79&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-40.png?resize=768%2C201&amp;ssl=1 768w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-40.png?w=1043&amp;ssl=1 1043w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption class=\"wp-element-caption\">Apparition des options de cr\u00e9ation de cellules dans un notebook<\/figcaption><\/figure>\n\n\n\n<p>Nous n&rsquo;allons pas nous attarder sur les cellules Markdown qui sont certes utiles, mais pas cruciales (d&rsquo;un point de vue purement technique). Si ce format ne vous est pas familier, direction la Cheat Sheet : <a href=\"https:\/\/www.markdownguide.org\/cheat-sheet\/\">Markdown Cheat Sheet | Markdown Guide<\/a>.<\/p>\n\n\n\n<p>A savoir qu&rsquo;il est toujours possible de modifier le type de cellule gr\u00e2ce au menu en haut \u00e0 droite lorsque l&rsquo;on est sur une cellule.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"254\" height=\"383\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-41.png?resize=254%2C383&#038;ssl=1\" alt=\"\" class=\"wp-image-699\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-41.png?w=254&amp;ssl=1 254w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-41.png?resize=199%2C300&amp;ssl=1 199w\" sizes=\"auto, (max-width: 254px) 100vw, 254px\" \/><figcaption class=\"wp-element-caption\">Menu d&rsquo;actions sur une cellule<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Imports_et_declarations\"><\/span>Imports et d\u00e9clarations<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Nous allons partir d&rsquo;un notebook vierge (et donc simplement supprimer le code de la 1\u00e8re cellule g\u00e9n\u00e9r\u00e9e, ou supprimer la cellule) pour d\u00e9marrer \u00ab\u00a0proprement\u00a0\u00bb.<\/p>\n\n\n\n<p>Comme dans tout code source, nous allons dans un premier temps \u00e9crire les diff\u00e9rents imports n\u00e9cessaires \u00e0 notre script. Pour ce faire nous allons donc cr\u00e9er (ou r\u00e9utiliser) une cellule de Code.<\/p>\n\n\n\n<p>Pour notre transformation, nous aurons seulement besoin d&rsquo;\u00eatre capables de retourner une colonne sp\u00e9cifique de notre dataframe. Nous utiliserons la fonction \u00ab\u00a0col\u00a0\u00bb (<a href=\"https:\/\/spark.apache.org\/docs\/latest\/api\/python\/reference\/pyspark.sql\/api\/pyspark.sql.functions.col.html\">pyspark.sql.functions.col \u2014 PySpark 3.5.3 documentation (apache.org)<\/a>) que l&rsquo;on trouvera gr\u00e2ce \u00e0 l&rsquo;import suivant :<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nfrom pyspark.sql.functions import col\n<\/pre><\/div>\n\n\n<p>Ensuite, nous allons d\u00e9finir nos diff\u00e9rents noms de fichiers et chemins de fichiers sur notre lake.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\n# zone  urls\nbronzeUrl = &#039;abfss:\/\/&lt;name of your container&gt;@&lt;name of your storage account&gt;.dfs.core.windows.net&#039;\nsilverUrl = &#039;abfss:\/\/&lt;name of your container&gt;@&lt;name of your storage account&gt;.dfs.core.windows.net&#039;\n\n# source files\nbronzeInvoiceLinesFile = &#039;\/staging\/InvoiceLines\/InvoiceLines.parquet&#039;\nbronzeInvoicesFile = &#039;\/staging\/Invoices\/Invoices.parquet&#039;\n\n# destination file\nsilverInvoicesFile = &#039;\/Invoices&#039;\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"La_transformation\"><\/span>La transformation !<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Pour commencer, nous allons avoir besoin de lire nos diff\u00e9rents fichiers sources et de les stocker dans un dataframe. En nous servant du code que synapse nous avait g\u00e9n\u00e9r\u00e9 plus t\u00f4t, nous allons remplacer le chemin de nos fichiers par nos variables.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndfInvoiceLines = spark.read.load(f&#039;{bronzeUrl}{bronzeInvoiceLinesFile}&#039;, format=&#039;parquet&#039;)\ndfInvoices = spark.read.load(f&#039;{bronzeUrl}{bronzeInvoicesFile}&#039;, format=&#039;parquet&#039;)\n<\/pre><\/div>\n\n\n<p>Nous arrivons maintenant \u00e0 notre premier sujet sensible. Nous voulons joindre deux dataframe qui si on les analyse un minimum comportent certaines colonnes qui ont le m\u00eame nom. Si dans certains cas cela ne pose pas de probl\u00e8me, Spark ne nous laissera pas \u00e9crire un fichier parquet avec deux colonnes qui portent le m\u00eame nom. Nous allons avoir besoin de les renommer \u00e0 un moment donn\u00e9.<\/p>\n\n\n\n<p>D&rsquo;autre part, m\u00eame si dans le cas pr\u00e9sent nous allons faire une jointure entre deux tables de la m\u00eame base de donn\u00e9e source. Et, que de ce fait le \u00ab\u00a0lineage\u00a0\u00bb est relativement simple, j&rsquo;ai pour habitude de pr\u00e9fixer les noms de colonnes pour identifier plus facilement la provenance des informations et ainsi ne plus avoir de colonnes avec le m\u00eame nom.<\/p>\n\n\n\n<p>Nous n&rsquo;allons \u00e9videmment pas faire ceci \u00e0 la main en \u00e9num\u00e9rant l&rsquo;int\u00e9gralit\u00e9 des colonnes pour plusieurs raisons :<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Nous sommes fain\u00e9ants<\/li>\n\n\n\n<li>Les sch\u00e9mas pourraient changer et nous aurions besoin d&rsquo;aller modifier notre code<\/li>\n\n\n\n<li>Nous sommes fain\u00e9ants !<\/li>\n<\/ul>\n\n\n\n<p>La solution sera gr\u00e2ce \u00e0 python d&rsquo;it\u00e9rer sur l&rsquo;ensemble de nos colonnes (d&rsquo;o\u00f9 le besoin de la fonction col import\u00e9 plus haut), et d&rsquo;utiliser la fonction alias pour changer le nom d&rsquo;une colonne (en ajoutant \u00ab\u00a0line_\u00a0\u00bb \u00e0 toutes les colonnes de la table des lignes de factures) et d&rsquo;assigner le dataframe ainsi g\u00e9n\u00e9r\u00e9 \u00e0 notre dataframe de d\u00e9part.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndfInvoiceLines = dfInvoiceLines.select(*(col(x).alias(&#039;line_&#039; + x) for x in dfInvoiceLines.columns))\n<\/pre><\/div>\n\n\n<p>C&rsquo;est maintenant le moment tant attendu de la jointure. Nous allons simplement \u00e9crire un left outer join ou \u00ab\u00a0jointure externe gauche\u00a0\u00bb (un inner join pourrait fonctionner si l&rsquo;on est certains de l&rsquo;int\u00e9grit\u00e9 entre lignes et en-t\u00eate).<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\nJoinCondition = &#x5B;dfInvoiceLines.line_InvoiceID==dfInvoices.InvoiceID]\ndfSilverInvoices = dfInvoiceLines.join(dfInvoices,JoinCondition,&quot;left_outer&quot;)\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Nos_donnees_en_zone_Silver\"><\/span>Nos donn\u00e9es en zone Silver<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Et tout ceci ne servirait \u00e0 rien sans l&rsquo;\u00e9tape finale qui consiste \u00e0 \u00e9crire le fichier r\u00e9sultant dans notre zone de datalake Silver.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: python; title: ; notranslate\" title=\"\">\ndfSilverInvoices.write.mode(&quot;overwrite&quot;).parquet(silverUrl + silverInvoicesFile)\n<\/pre><\/div>\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Le_Notebook_final\"><\/span>Le Notebook final<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Le format d&rsquo;un notebook et ses diff\u00e9rentes cellules n&rsquo;\u00e9tant pas le moyen le plus simple \u00e0 partager en \u00ab\u00a0code pur\u00a0\u00bb, voici une capture d&rsquo;\u00e9cran de ce \u00e0 quoi doit ressembler votre notebook si vous avez copier l&rsquo;ensemble des bouts de codes pr\u00e9c\u00e9dents dans l&rsquo;ordre. La s\u00e9paration des cellules et l&rsquo;ajout des annotations en Markdown n&rsquo;ont aucune importance dans notre exemple. La seule chose qui \u00e0 de l&rsquo;importance est l&rsquo;ordre des cellules car notre Notebook sera ex\u00e9cut\u00e9 cellule apr\u00e8s cellule de haut en bas.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"770\" height=\"1021\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-39.png?resize=770%2C1021&#038;ssl=1\" alt=\"\" class=\"wp-image-697\" style=\"width:768px;height:auto\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-39.png?w=770&amp;ssl=1 770w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-39.png?resize=226%2C300&amp;ssl=1 226w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/image-39.png?resize=768%2C1018&amp;ssl=1 768w\" sizes=\"auto, (max-width: 770px) 100vw, 770px\" \/><figcaption class=\"wp-element-caption\">Notebook de transformation complet<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Ce premier Notebook est tr\u00e8s simple et le code Spark \u00e9crit n&rsquo;est pas forc\u00e9ment robuste aux erreurs ni techniquement optimis\u00e9. Il a cependant le m\u00e9rite de fonctionner pour nous permettre d&rsquo;avancer dans la cr\u00e9ation de notre datalakehouse.<\/p>\n\n\n\n<p>A partir de cet exemple, il est possible de dupliquer le code en changeant les noms de fichiers (ainsi que les r\u00e8gles de calculs si besoin) pour g\u00e9rer les autres tables de faits de WWI. Il est possible de traiter les Orders et OrderLines sur le m\u00eame sch\u00e9ma exactement.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Synapse nous permet d&rsquo;utiliser Apache Spark en tant que Runtime de processing afin de travailler nos diff\u00e9rents datasets. Dans cet article, nous allons cr\u00e9er notre premier notebook Spark (et le pool Spark n\u00e9cessaire) pour transformer nos premi\u00e8res donn\u00e9es de la zone bronze pour commencer \u00e0 alimenter notre zone silver. Introduction et pool Spark Un Notebook [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":712,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[76],"tags":[9,80,79],"class_list":["post-680","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-synapsedatalake","tag-azure-synapse-analytics","tag-notebook","tag-spark"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Notre premier notebook Spark dans Synapse - NiceData<\/title>\n<meta name=\"description\" content=\"Nous allons cr\u00e9er notre premier notebook Spark (et le pool Spark n\u00e9cessaire) pour transformer nos premi\u00e8res donn\u00e9es de la zone bronze pour commencer \u00e0 alimenter notre zone silver.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Notre premier notebook Spark dans Synapse - NiceData\" \/>\n<meta property=\"og:description\" content=\"Nous allons cr\u00e9er notre premier notebook Spark (et le pool Spark n\u00e9cessaire) pour transformer nos premi\u00e8res donn\u00e9es de la zone bronze pour commencer \u00e0 alimenter notre zone silver.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/\" \/>\n<meta property=\"og:site_name\" content=\"NiceData\" \/>\n<meta property=\"article:published_time\" content=\"2024-08-17T12:09:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-10-07T13:07:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1707\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jean-Laurent Ferralis\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@JLFerralis\" \/>\n<meta name=\"twitter:site\" content=\"@JLFerralis\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jean-Laurent Ferralis\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/\"},\"author\":{\"name\":\"Jean-Laurent Ferralis\",\"@id\":\"https:\/\/www.nicedata.fr\/#\/schema\/person\/8d1ad38004d3b0cf6bff7c200c795e19\"},\"headline\":\"Notre premier notebook Spark dans Synapse\",\"datePublished\":\"2024-08-17T12:09:42+00:00\",\"dateModified\":\"2024-10-07T13:07:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/\"},\"wordCount\":2261,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/www.nicedata.fr\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg?fit=2560%2C1707&ssl=1\",\"keywords\":[\"Azure Synapse Analytics\",\"Notebook\",\"Spark\"],\"articleSection\":[\"Azure Synapse - Datalake\"],\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/\",\"url\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/\",\"name\":\"Notre premier notebook Spark dans Synapse - NiceData\",\"isPartOf\":{\"@id\":\"https:\/\/www.nicedata.fr\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg?fit=2560%2C1707&ssl=1\",\"datePublished\":\"2024-08-17T12:09:42+00:00\",\"dateModified\":\"2024-10-07T13:07:27+00:00\",\"description\":\"Nous allons cr\u00e9er notre premier notebook Spark (et le pool Spark n\u00e9cessaire) pour transformer nos premi\u00e8res donn\u00e9es de la zone bronze pour commencer \u00e0 alimenter notre zone silver.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg?fit=2560%2C1707&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg?fit=2560%2C1707&ssl=1\",\"width\":2560,\"height\":1707,\"caption\":\"notebook drawing\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.nicedata.fr\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Notre premier notebook Spark dans Synapse\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.nicedata.fr\/#website\",\"url\":\"https:\/\/www.nicedata.fr\/\",\"name\":\"NiceData\",\"description\":\"L&#039;expertise Data du sud\",\"publisher\":{\"@id\":\"https:\/\/www.nicedata.fr\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.nicedata.fr\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.nicedata.fr\/#organization\",\"name\":\"NiceData\",\"url\":\"https:\/\/www.nicedata.fr\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/www.nicedata.fr\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/11\/NICE-DATA_JLFMod.webp?fit=2493%2C1249&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/11\/NICE-DATA_JLFMod.webp?fit=2493%2C1249&ssl=1\",\"width\":2493,\"height\":1249,\"caption\":\"NiceData\"},\"image\":{\"@id\":\"https:\/\/www.nicedata.fr\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/JLFerralis\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.nicedata.fr\/#\/schema\/person\/8d1ad38004d3b0cf6bff7c200c795e19\",\"name\":\"Jean-Laurent Ferralis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g\",\"caption\":\"Jean-Laurent Ferralis\"},\"description\":\"French Data Professionnal - BI consultant and #sql lover. I also #swimbikerun when possible ! Living in @villedenice\",\"sameAs\":[\"http:\/\/xp-it.com\"],\"url\":\"https:\/\/www.nicedata.fr\/index.php\/author\/jlf\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Notre premier notebook Spark dans Synapse - NiceData","description":"Nous allons cr\u00e9er notre premier notebook Spark (et le pool Spark n\u00e9cessaire) pour transformer nos premi\u00e8res donn\u00e9es de la zone bronze pour commencer \u00e0 alimenter notre zone silver.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/","og_locale":"fr_FR","og_type":"article","og_title":"Notre premier notebook Spark dans Synapse - NiceData","og_description":"Nous allons cr\u00e9er notre premier notebook Spark (et le pool Spark n\u00e9cessaire) pour transformer nos premi\u00e8res donn\u00e9es de la zone bronze pour commencer \u00e0 alimenter notre zone silver.","og_url":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/","og_site_name":"NiceData","article_published_time":"2024-08-17T12:09:42+00:00","article_modified_time":"2024-10-07T13:07:27+00:00","og_image":[{"width":2560,"height":1707,"url":"https:\/\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg","type":"image\/jpeg"}],"author":"Jean-Laurent Ferralis","twitter_card":"summary_large_image","twitter_creator":"@JLFerralis","twitter_site":"@JLFerralis","twitter_misc":{"\u00c9crit par":"Jean-Laurent Ferralis","Dur\u00e9e de lecture estim\u00e9e":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#article","isPartOf":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/"},"author":{"name":"Jean-Laurent Ferralis","@id":"https:\/\/www.nicedata.fr\/#\/schema\/person\/8d1ad38004d3b0cf6bff7c200c795e19"},"headline":"Notre premier notebook Spark dans Synapse","datePublished":"2024-08-17T12:09:42+00:00","dateModified":"2024-10-07T13:07:27+00:00","mainEntityOfPage":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/"},"wordCount":2261,"commentCount":1,"publisher":{"@id":"https:\/\/www.nicedata.fr\/#organization"},"image":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg?fit=2560%2C1707&ssl=1","keywords":["Azure Synapse Analytics","Notebook","Spark"],"articleSection":["Azure Synapse - Datalake"],"inLanguage":"fr-FR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/","url":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/","name":"Notre premier notebook Spark dans Synapse - NiceData","isPartOf":{"@id":"https:\/\/www.nicedata.fr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#primaryimage"},"image":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg?fit=2560%2C1707&ssl=1","datePublished":"2024-08-17T12:09:42+00:00","dateModified":"2024-10-07T13:07:27+00:00","description":"Nous allons cr\u00e9er notre premier notebook Spark (et le pool Spark n\u00e9cessaire) pour transformer nos premi\u00e8res donn\u00e9es de la zone bronze pour commencer \u00e0 alimenter notre zone silver.","breadcrumb":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#primaryimage","url":"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg?fit=2560%2C1707&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg?fit=2560%2C1707&ssl=1","width":2560,"height":1707,"caption":"notebook drawing"},{"@type":"BreadcrumbList","@id":"https:\/\/www.nicedata.fr\/index.php\/2024\/08\/17\/notre-premier-notebook-spark-dans-synapse\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.nicedata.fr\/"},{"@type":"ListItem","position":2,"name":"Notre premier notebook Spark dans Synapse"}]},{"@type":"WebSite","@id":"https:\/\/www.nicedata.fr\/#website","url":"https:\/\/www.nicedata.fr\/","name":"NiceData","description":"L&#039;expertise Data du sud","publisher":{"@id":"https:\/\/www.nicedata.fr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.nicedata.fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/www.nicedata.fr\/#organization","name":"NiceData","url":"https:\/\/www.nicedata.fr\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/www.nicedata.fr\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/11\/NICE-DATA_JLFMod.webp?fit=2493%2C1249&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/11\/NICE-DATA_JLFMod.webp?fit=2493%2C1249&ssl=1","width":2493,"height":1249,"caption":"NiceData"},"image":{"@id":"https:\/\/www.nicedata.fr\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/JLFerralis"]},{"@type":"Person","@id":"https:\/\/www.nicedata.fr\/#\/schema\/person\/8d1ad38004d3b0cf6bff7c200c795e19","name":"Jean-Laurent Ferralis","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g","caption":"Jean-Laurent Ferralis"},"description":"French Data Professionnal - BI consultant and #sql lover. I also #swimbikerun when possible ! Living in @villedenice","sameAs":["http:\/\/xp-it.com"],"url":"https:\/\/www.nicedata.fr\/index.php\/author\/jlf\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/09\/notebookDrawing-scaled.jpg?fit=2560%2C1707&ssl=1","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/posts\/680","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/comments?post=680"}],"version-history":[{"count":7,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/posts\/680\/revisions"}],"predecessor-version":[{"id":716,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/posts\/680\/revisions\/716"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/media\/712"}],"wp:attachment":[{"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/media?parent=680"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/categories?post=680"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/tags?post=680"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}