{"id":215,"date":"2022-09-01T08:52:07","date_gmt":"2022-09-01T06:52:07","guid":{"rendered":"https:\/\/www.nicedata.fr\/?p=215"},"modified":"2024-08-20T14:49:18","modified_gmt":"2024-08-20T12:49:18","slug":"generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics","status":"publish","type":"post","link":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/","title":{"rendered":"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics"},"content":{"rendered":"\n<p>Afin d&rsquo;optimiser les requ\u00eates effectu\u00e9es sur des fichiers parquets depuis un pool serverless, on travail g\u00e9n\u00e9ralement avec des datasets partitionn\u00e9s. Cet article d\u00e9crit comment g\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analystics.<\/p>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_81 ez-toc-wrap-center counter-hierarchy ez-toc-counter ez-toc-transparent ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Sommaire<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Introduction\" >Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Cas_dusage_%E2%80%93_Generer_un_dataset_partitionne_de_nos_factures\" >Cas d&rsquo;usage &#8211; G\u00e9n\u00e9rer un dataset partitionn\u00e9 de nos factures<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Les_mains_dedans\" >Les mains dedans !<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Vue_densemble\" >Vue d\u2019ensemble<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Creation_du_data_flow\" >Cr\u00e9ation du data flow<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Ajout_de_la_1ere_source\" >Ajout de la 1ere source<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Ajout_de_la_2eme_source\" >Ajout de la 2\u00e8me source<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Creation_de_la_jointure\" >Cr\u00e9ation de la jointure<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Creation_des_colonnes_derivees\" >Cr\u00e9ation des colonnes d\u00e9riv\u00e9es<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Creation_des_partitions\" >Cr\u00e9ation des partitions<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Verification\" >V\u00e9rification<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Conclusion\" >Conclusion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#Pour_aller_plus_loin\" >Pour aller plus loin<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Introduction\"><\/span>Introduction<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Lorsque l&rsquo;on observe la structure hi\u00e9rarchique d&rsquo;un Datalake, on tombe tr\u00e8s souvent sur une architecture partitionn\u00e9e (et le plus souvent par date). Ceci est principalement fait pour optimiser les requ\u00eates d&rsquo;extraction de donn\u00e9es qui exploitent souvent des plages de dates bien sp\u00e9cifiques.<\/p>\n\n\n\n<p>Des cas d&rsquo;usages tr\u00e8s fr\u00e9quents sont :<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>S\u00e9lection des donn\u00e9es d&rsquo;un mois ou d&rsquo;une ann\u00e9e sp\u00e9cifique<\/li><li>S\u00e9lection des donn\u00e9es depuis un mois ou une ann\u00e9e sp\u00e9cifique<\/li><li>S\u00e9lection des donn\u00e9es entre une plage de date<\/li><\/ul>\n\n\n\n<p>Ces cas d&rsquo;usage sont fr\u00e9quents et lorsque l&rsquo;on a beaucoup de donn\u00e9es, il peut devenir tr\u00e8s consommateur de ressource de lire une infime partie des donn\u00e9es dans de tr\u00e8s gros fichiers parquets.<\/p>\n\n\n\n<p>C&rsquo;est l\u00e0 qu&rsquo;intervient le partitionnement. Au lieu d&rsquo;avoir un \u00ab\u00a0gros\u00a0\u00bb fichier parquet qui (malgr\u00e9 toutes les optimisations du format) demande \u00e0 \u00eatre pars\u00e9 pour retrouver seulement les donn\u00e9es n\u00e9cessaires, on va se limiter \u00e0 lire uniquement une s\u00e9lection pr\u00e9cise de petits fichiers.<\/p>\n\n\n\n<p>Pour ce faire il faut changer la fa\u00e7on dont on \u00e9crit nos donn\u00e9es et passer de :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"120\" height=\"24\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-53.png?resize=120%2C24&#038;ssl=1\" alt=\"\" class=\"wp-image-216\"\/><figcaption>Un \u00ab\u00a0gros\u00a0\u00bb fichier parquet<\/figcaption><\/figure>\n\n\n\n<p>\u00e0 :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"912\" height=\"396\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-54.png?resize=912%2C396&#038;ssl=1\" alt=\"\" class=\"wp-image-217\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-54.png?w=912&amp;ssl=1 912w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-54.png?resize=300%2C130&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-54.png?resize=768%2C333&amp;ssl=1 768w\" sizes=\"auto, (max-width: 912px) 100vw, 912px\" \/><figcaption>donn\u00e9es partitionn\u00e9es en de multiples fichiers parquets<\/figcaption><\/figure>\n\n\n\n<p>Forc\u00e9ment vu comme \u00e7a il y a un peut de travail. Il existe de nombreuses fa\u00e7ons de faire ce partitioning et nous allons nous int\u00e9resser a produire celui-ci via les pipelines Azure Synapse Analytics (et \u00e7a marche aussi via ADF \u00e9videmment).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Cas_dusage_%E2%80%93_Generer_un_dataset_partitionne_de_nos_factures\"><\/span>Cas d&rsquo;usage &#8211; G\u00e9n\u00e9rer un dataset partitionn\u00e9 de nos factures<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Cet article peut se suffire \u00e0 lui-m\u00eame en utilisant un fichier sp\u00e9cifique et adaptant en cons\u00e9quence nos d\u00e9veloppements, mais il utilise la sortie g\u00e9n\u00e9r\u00e9e dans l&rsquo;article : <a href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/08\/19\/parametrer-les-pipeline-azure-synapse-analytics-sans-ouvrir-synapse\/\" target=\"_blank\" rel=\"noreferrer noopener\">Param\u00e9trer les Pipeline Azure Synapse Analytics sans ouvrir Synapse !<\/a><\/p><\/blockquote>\n\n\n\n<p>En prenant comme source la base WWI de Microsoft on remarque que nos factures sont d\u00e9compos\u00e9es en deux tables \u00ab\u00a0Invoices\u00a0\u00bb et \u00ab\u00a0InvoicesLines\u00a0\u00bb. Ce design est tr\u00e8s standard ou une table \u00ab\u00a0d&rsquo;ent\u00eate\u00a0\u00bb comporte des donn\u00e9es globales \u00e0 plusieurs \u00ab\u00a0lignes\u00a0\u00bb de donn\u00e9es.<\/p>\n\n\n\n<p>Si ce design est particuli\u00e8rement adapt\u00e9 au mod\u00e8le relationnel, dans le monde du bigdata, on lui pr\u00e9f\u00e8re tr\u00e8s souvent le \u00ab\u00a0grosse table \u00e0 plat\u00a0\u00bb dans un fichier. Ceci permet \u00e0 nos scripts ou utilisateurs de se sourcer sur un seul dataset comprenant l&rsquo;ensemble de leurs donn\u00e9es et n&rsquo;ayant pas besoin de faire des jointures ou autres  transformations r\u00e9currentes.<\/p>\n\n\n\n<p>Dans le cas des factures, les informations de prix, quantit\u00e9s et articles sont sur les \u00ab\u00a0lignes\u00a0\u00bb cependant la date, le client, l&rsquo;adresse de livraison sont dans les ent\u00eates. Pour s&rsquo;simplifier l&rsquo;usage de ces donn\u00e9es, nous allons cr\u00e9er un dataset \u00ab\u00a0Invoice\u00a0\u00bb qui fera la jointure des deux tables et pour optimiser son requ\u00eatage futur nous allons partitionner celui-ci par Ann\u00e9e\/Mois. <\/p>\n\n\n\n<p>Et comme nous cr\u00e9ons un nouveau \u00ab\u00a0dataset intelligent\u00a0\u00bb nous allons le positionn\u00e9 dans la zone \u00ab\u00a0Curated\u00a0\u00bb de notre datalake pour pouvoir \u00e9ventuellement l&rsquo;exposer \u00e0 des utilisateurs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Les_mains_dedans\"><\/span>Les mains dedans !<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Les ressources utilis\u00e9es dans cette exp\u00e9rience sont:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Un Workspace Synapse (la base)<\/li><li>Un Azure data lake gen 2 (le datalake)<\/li><li>Deux fichiers parquet qui contiennent les donn\u00e9es des tables \u00ab\u00a0Invoices\u00a0\u00bb et \u00ab\u00a0InvoiceLines\u00a0\u00bb de la base WWI (voir article <a href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/08\/19\/parametrer-les-pipeline-azure-synapse-analytics-sans-ouvrir-synapse\/\" target=\"_blank\" rel=\"noreferrer noopener\">Param\u00e9trer les Pipeline Azure Synapse Analytics sans ouvrir Synapse !<\/a> pour les g\u00e9n\u00e9rer)<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Vue_densemble\"><\/span>Vue d\u2019ensemble<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>La solution consiste en un data flow comprenant  les activit\u00e9s suivantes :<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Deux sources<\/li><li>Une jointure<\/li><li>Une colonne d\u00e9riv\u00e9e<\/li><li>Une cible<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"876\" height=\"242\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-55.png?resize=876%2C242&#038;ssl=1\" alt=\"\" class=\"wp-image-220\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-55.png?w=876&amp;ssl=1 876w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-55.png?resize=300%2C83&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-55.png?resize=768%2C212&amp;ssl=1 768w\" sizes=\"auto, (max-width: 876px) 100vw, 876px\" \/><figcaption>Vue d&rsquo;ensemble du data flow<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Creation_du_data_flow\"><\/span>Cr\u00e9ation du data flow<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Pour d\u00e9marrer, nous allons donc dans la partie d\u00e9veloppement cr\u00e9er un nouveau data flow :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"657\" height=\"344\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-56.png?resize=657%2C344&#038;ssl=1\" alt=\"\" class=\"wp-image-221\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-56.png?w=657&amp;ssl=1 657w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-56.png?resize=300%2C157&amp;ssl=1 300w\" sizes=\"auto, (max-width: 657px) 100vw, 657px\" \/><figcaption>Cr\u00e9ation d&rsquo;un data flow<\/figcaption><\/figure>\n\n\n\n<p>Nous arrivons maintenant dans l&rsquo;\u00e9diteur de data flow qui nous propose directement d&rsquo;ajouter une nouvelle source de donn\u00e9e ainsi que donner un nom \u00e0 notre data flow :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"976\" height=\"388\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-57.png?resize=976%2C388&#038;ssl=1\" alt=\"\" class=\"wp-image-222\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-57.png?w=976&amp;ssl=1 976w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-57.png?resize=300%2C119&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-57.png?resize=768%2C305&amp;ssl=1 768w\" sizes=\"auto, (max-width: 976px) 100vw, 976px\" \/><figcaption>Editeur de data flow<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Ajout_de_la_1ere_source\"><\/span>Ajout de la 1ere source<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>En ajoutant la source, l&rsquo;\u00e9diteur nous propose de renseigner deux champs importants :<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>le \u00ab\u00a0nom de la boite\u00a0\u00bb : oui c&rsquo;est important pour s&rsquo;y retrouver plut\u00f4t que de tout laisser par d\u00e9faut de de se retrouver avec source1 \/ source2 \/ &#8230; dans nos flows !<\/li><li>le \u00ab\u00a0Dataset\u00a0\u00bb source : on va connecter notre fichier des ent\u00eates de factures \u00ab\u00a0Invoices.parquet\u00a0\u00bb<\/li><\/ul>\n\n\n\n<h4 class=\"wp-block-heading\">Cr\u00e9ation du dataset source<\/h4>\n\n\n\n<p>Cette manipulation est r\u00e9currente, nous allons donc la d\u00e9crire pour le premier dataset et vous serez ensuite seul pour les suivants \ud83d\ude09 (tant qu&rsquo;ils n&rsquo;auront rien de particuli\u00e9).<\/p>\n\n\n\n<p>Donc tr\u00e8s simplement, nous allons cliquer sur \u00ab\u00a0+\u00a0\u00bb :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"572\" height=\"40\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-59.png?resize=572%2C40&#038;ssl=1\" alt=\"\" class=\"wp-image-224\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-59.png?w=572&amp;ssl=1 572w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-59.png?resize=300%2C21&amp;ssl=1 300w\" sizes=\"auto, (max-width: 572px) 100vw, 572px\" \/><figcaption>Cr\u00e9ation d&rsquo;un nouveau dataset<\/figcaption><\/figure>\n\n\n\n<p>La premi\u00e8re \u00e9tape est de s\u00e9lectionner le \u00ab\u00a0data store\u00a0\u00bb, le type de stockage ou se trouve dans notre fichier. Dans notre cas, ce sera un Azure Data Lakke Storage Gen2 :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-60.png?resize=632%2C897&#038;ssl=1\" alt=\"\" class=\"wp-image-225\" width=\"632\" height=\"897\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-60.png?w=632&amp;ssl=1 632w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-60.png?resize=211%2C300&amp;ssl=1 211w\" sizes=\"auto, (max-width: 632px) 100vw, 632px\" \/><figcaption>Selection du type de Data store<\/figcaption><\/figure>\n\n\n\n<p>Nous devons ensuite s\u00e9lectionner le format de fichier. Nous utilisons des fichiers parquet :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"632\" height=\"898\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-61.png?resize=632%2C898&#038;ssl=1\" alt=\"\" class=\"wp-image-226\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-61.png?w=632&amp;ssl=1 632w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-61.png?resize=211%2C300&amp;ssl=1 211w\" sizes=\"auto, (max-width: 632px) 100vw, 632px\" \/><figcaption>S\u00e9lection du format de fichier <\/figcaption><\/figure>\n\n\n\n<p>Nous devons maintenant donner un nom et d\u00e9finir le service li\u00e9 utilis\u00e9 pour retrouver notre fichier. Nous choisissons donc notre datalake :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"632\" height=\"361\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-62.png?resize=632%2C361&#038;ssl=1\" alt=\"\" class=\"wp-image-227\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-62.png?w=632&amp;ssl=1 632w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-62.png?resize=300%2C171&amp;ssl=1 300w\" sizes=\"auto, (max-width: 632px) 100vw, 632px\" \/><figcaption>Selection du service li\u00e9<\/figcaption><\/figure>\n\n\n\n<p>Pour ensuite configurer le chemin de notre fichier. Pour se faire, on peut remplir \u00e0 la main  le chemin ou utiliser le navigateur :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"602\" height=\"140\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-63.png?resize=602%2C140&#038;ssl=1\" alt=\"\" class=\"wp-image-228\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-63.png?w=602&amp;ssl=1 602w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-63.png?resize=300%2C70&amp;ssl=1 300w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><figcaption>D\u00e9finition du chemin de fichier<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"396\" height=\"442\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-64.png?resize=396%2C442&#038;ssl=1\" alt=\"\" class=\"wp-image-229\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-64.png?w=396&amp;ssl=1 396w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-64.png?resize=269%2C300&amp;ssl=1 269w\" sizes=\"auto, (max-width: 396px) 100vw, 396px\" \/><figcaption>Parcours de notre lake<\/figcaption><\/figure>\n\n\n\n<p>Notre dataset est maintenant configur\u00e9 :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"633\" height=\"381\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-65.png?resize=633%2C381&#038;ssl=1\" alt=\"\" class=\"wp-image-230\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-65.png?w=633&amp;ssl=1 633w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-65.png?resize=300%2C181&amp;ssl=1 300w\" sizes=\"auto, (max-width: 633px) 100vw, 633px\" \/><figcaption>Fin de configuration du dataset<\/figcaption><\/figure>\n\n\n\n<p>Le reste de la configuration peut rester par d\u00e9faut. Nous avons la possibilit\u00e9 de faire une preview des donn\u00e9es en d\u00e9marrant une session de debug au besoin.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"801\" height=\"603\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-58.png?resize=801%2C603&#038;ssl=1\" alt=\"\" class=\"wp-image-223\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-58.png?w=801&amp;ssl=1 801w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-58.png?resize=300%2C226&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-58.png?resize=768%2C578&amp;ssl=1 768w\" sizes=\"auto, (max-width: 801px) 100vw, 801px\" \/><figcaption>Notre source de donn\u00e9e est maintenant configur\u00e9e<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Ajout_de_la_2eme_source\"><\/span>Ajout de la 2\u00e8me source<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>La 2\u00e8me source ce configure exactement de la m\u00eame mani\u00e8re que pr\u00e9c\u00e9demment en changeant simplement le fichier source par \u00ab\u00a0InvoiceLines.parquet\u00a0\u00bb.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"809\" height=\"678\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-66.png?resize=809%2C678&#038;ssl=1\" alt=\"\" class=\"wp-image-231\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-66.png?w=809&amp;ssl=1 809w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-66.png?resize=300%2C251&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-66.png?resize=768%2C644&amp;ssl=1 768w\" sizes=\"auto, (max-width: 809px) 100vw, 809px\" \/><figcaption>Nos deux sources sont configur\u00e9es<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Creation_de_la_jointure\"><\/span>Cr\u00e9ation de la jointure<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>En cliquant sur le petit \u00ab\u00a0+\u00a0\u00bb nous allons ajouter notre composant de jointure :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"441\" height=\"323\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-67.png?resize=441%2C323&#038;ssl=1\" alt=\"\" class=\"wp-image-232\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-67.png?w=441&amp;ssl=1 441w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-67.png?resize=300%2C220&amp;ssl=1 300w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><figcaption>Ajout de la jointure<\/figcaption><\/figure>\n\n\n\n<p>Le param\u00e9trage est simple. On cherche \u00e0 ajouter \u00e0 chaque ent\u00eate ces lignes. On fait donc une jointure gauche  des \u00ab\u00a0Invoices\u00a0\u00bb vers les \u00ab\u00a0InvoiceLines\u00a0\u00bb. La clef de jointure est en g\u00e9n\u00e9ral simple (si le mod\u00e8le source est propre) et dans notre cas \u00ab\u00a0InvoiceID\u00a0\u00bb = \u00ab\u00a0InvoiceID\u00a0\u00bb :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"837\" height=\"414\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-68.png?resize=837%2C414&#038;ssl=1\" alt=\"\" class=\"wp-image-233\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-68.png?w=837&amp;ssl=1 837w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-68.png?resize=300%2C148&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-68.png?resize=768%2C380&amp;ssl=1 768w\" sizes=\"auto, (max-width: 837px) 100vw, 837px\" \/><figcaption>Param\u00e9trage de la jointure gauche<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Creation_des_colonnes_derivees\"><\/span>Cr\u00e9ation des colonnes d\u00e9riv\u00e9es<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Jusqu&rsquo;\u00e0 pr\u00e9sent, nous avons surtout pr\u00e9par\u00e9 notre jeu de donn\u00e9e. Nous arrivons maintenant au moment ou l&rsquo;on va pr\u00e9parer notre partitionnement. Pour rappel, nous voulons partitionner par Ann\u00e9e\/Mois. Nous avons bien une date de commande, mais s&rsquo;il est possible de configurer le partitionnement en utilisant une expression sp\u00e9cifique, je recommande vivement de g\u00e9n\u00e9rer les valeurs correctement dans des colonnes pour ensuite utilise directement cette valeur dans le partitionnement. <\/p>\n\n\n\n<p>Pour ce faire, nous allons donc g\u00e9n\u00e9rer deux colonnes ann\u00e9e et mois \u00e0 partir de notre date de commande via les derived column. On ajoute donc le composant :<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"665\" height=\"371\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-69.png?resize=665%2C371&#038;ssl=1\" alt=\"\" class=\"wp-image-235\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-69.png?w=665&amp;ssl=1 665w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-69.png?resize=300%2C167&amp;ssl=1 300w\" sizes=\"auto, (max-width: 665px) 100vw, 665px\" \/><figcaption>Ajout des colonnes d\u00e9rriv\u00e9es<\/figcaption><\/figure>\n\n\n\n<p>La configuration est tr\u00e8s simple dans notre cas et il suffit d&rsquo;\u00e9crire les expressions correspondantes en ajoutant les nouvelles colonnes :<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>year &#8211;&gt; year(InvoiceDate)<\/li><li>month &#8211;&gt; month(InvoiceDate)<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1018\" height=\"315\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-70.png?resize=1018%2C315&#038;ssl=1\" alt=\"\" class=\"wp-image-237\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-70.png?w=1018&amp;ssl=1 1018w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-70.png?resize=300%2C93&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-70.png?resize=768%2C238&amp;ssl=1 768w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption>Cr\u00e9ation des colonnes d\u00e9riv\u00e9es<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Creation_des_partitions\"><\/span>Cr\u00e9ation des partitions<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Notre dataset est pr\u00eat, il nous reste \u00e0 \u00e9crire celui-ci dans sa destination en param\u00e9trant le partitionnement. C&rsquo;est sur la destination que tout va se configurer et nous allons ajouter un r\u00e9cepteur :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"878\" height=\"372\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-71.png?resize=878%2C372&#038;ssl=1\" alt=\"\" class=\"wp-image-238\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-71.png?w=878&amp;ssl=1 878w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-71.png?resize=300%2C127&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-71.png?resize=768%2C325&amp;ssl=1 768w\" sizes=\"auto, (max-width: 878px) 100vw, 878px\" \/><figcaption>Ajout d&rsquo;un r\u00e9cepteur<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Cr\u00e9ation du dataset de destination<\/h4>\n\n\n\n<p>Techniquement parlant, la cr\u00e9ation du dataset passe par les m\u00eames \u00e9tapes que pr\u00e9c\u00e9demment sauf que dans notre cas nous ne souhaitons pas \u00ab\u00a0forcer\u00a0\u00bb le chemin du fichier ni le nom de ceux-ci, car nous esp\u00e9rons justement partitionner notre dataset.<\/p>\n\n\n\n<p>Pour se faire, il suffit de configurer notre dataset avec uniquement les informations connues pour le file path. Nous souhaitons que notre dataset soit partitionn\u00e9 dans la zone \u00ab\u00a0curated\u00a0\u00bb de notre lake et dans le r\u00e9pertoire \u00ab\u00a0invoice\u00a0\u00bb de cette zone. C&rsquo;est ce que nous allons renseigner sans pr\u00e9ciser de nom de fichier :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"961\" height=\"424\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-72.png?resize=961%2C424&#038;ssl=1\" alt=\"\" class=\"wp-image-239\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-72.png?w=961&amp;ssl=1 961w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-72.png?resize=300%2C132&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-72.png?resize=768%2C339&amp;ssl=1 768w\" sizes=\"auto, (max-width: 961px) 100vw, 961px\" \/><figcaption>D\u00e9finition d&rsquo;un dataset g\u00e9n\u00e9rique<\/figcaption><\/figure>\n\n\n\n<p>Notre destination est maintenant configur\u00e9e :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"889\" height=\"631\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-73.png?resize=889%2C631&#038;ssl=1\" alt=\"\" class=\"wp-image-240\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-73.png?w=889&amp;ssl=1 889w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-73.png?resize=300%2C213&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-73.png?resize=768%2C545&amp;ssl=1 768w\" sizes=\"auto, (max-width: 889px) 100vw, 889px\" \/><figcaption>Configuration du r\u00e9cepteur<\/figcaption><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">D\u00e9finition du partitionnement<\/h4>\n\n\n\n<p>S&rsquo;il est possible de configurer tr\u00e8s finement notre destination, la chose qui nous int\u00e9resse dans cet article est le partitionnement et pour ce faire nous avons rendez-vous dans l&rsquo;onglet d&rsquo;optimisation de notre r\u00e9cepteur.<\/p>\n\n\n\n<p>ici plusieurs options s&rsquo;offrent \u00e0 nous. Cependant, tout le travail r\u00e9alis\u00e9 pr\u00e9c\u00e9demment a eu pour but de d\u00e9finir correctement nos partitions en cr\u00e9ant des colones sp\u00e9cifiques. notre choix se porte donc sur un partitionnement d\u00e9fini par cl\u00e9 pour laquelle nous allons d\u00e9finir dans l&rsquo;ordre nos colonnes :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"767\" height=\"427\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-74.png?resize=767%2C427&#038;ssl=1\" alt=\"\" class=\"wp-image-241\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-74.png?w=767&amp;ssl=1 767w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-74.png?resize=300%2C167&amp;ssl=1 300w\" sizes=\"auto, (max-width: 767px) 100vw, 767px\" \/><figcaption>D\u00e9finition du partitionnement<\/figcaption><\/figure>\n\n\n\n<p>C&rsquo;est cette configuration pr\u00e9cis\u00e9ment qui va \u00ab\u00a0\u00e9clater\u00a0\u00bb notre dataset en plusieurs fichiers d\u00e9pos\u00e9 dans des r\u00e9pertoires diff\u00e9rents pour chaque valeur de la colonne \u00ab\u00a0year\u00a0\u00bb et \u00e0 l&rsquo;int\u00e9rieur de celle-ci dans diff\u00e9rents r\u00e9pertoires pour chaque valeur de \u00ab\u00a0month\u00a0\u00bb. L&rsquo;interface nous indique que chaque partition aura des valeurs uniques pour chaque colonne d\u00e9finie.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Verification\"><\/span>V\u00e9rification<span class=\"ez-toc-section-end\"><\/span><\/h3>\n\n\n\n<p>Il ne nous reste plus qu&rsquo;\u00e0 ex\u00e9cuter notre data flow pour partitionner notre dataset. Pour ce faire nous pouvons l&rsquo;int\u00e9grer \u00e0 un pipeline existant ou en cr\u00e9er un nouveau. <\/p>\n\n\n\n<p>Pour l&rsquo;occasion, nous allons un nouveau pipeline avec l&rsquo;ajoute de l&rsquo;activit\u00e9 \u00ab\u00a0Data flow\u00a0\u00bb configur\u00e9e pour ex\u00e9cuter le data flow d\u00e9velopp\u00e9 pr\u00e9c\u00e9demment.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"844\" height=\"794\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-75.png?resize=844%2C794&#038;ssl=1\" alt=\"\" class=\"wp-image-243\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-75.png?w=844&amp;ssl=1 844w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-75.png?resize=300%2C282&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-75.png?resize=768%2C723&amp;ssl=1 768w\" sizes=\"auto, (max-width: 844px) 100vw, 844px\" \/><figcaption>Cr\u00e9ation d&rsquo;un pipeline de test<\/figcaption><\/figure>\n\n\n\n<p>Nous nous assurons que notre datalake est \u00ab\u00a0propre\u00a0\u00bb afin de bien observer notre r\u00e9sultat :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"798\" height=\"393\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-76.png?resize=798%2C393&#038;ssl=1\" alt=\"\" class=\"wp-image-244\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-76.png?w=798&amp;ssl=1 798w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-76.png?resize=300%2C148&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-76.png?resize=768%2C378&amp;ssl=1 768w\" sizes=\"auto, (max-width: 798px) 100vw, 798px\" \/><figcaption>datalake \u00ab\u00a0vide\u00a0\u00bb<\/figcaption><\/figure>\n\n\n\n<p>Lan\u00e7ons maintenant l&rsquo;ex\u00e9cution (ou le debug) de notre pipeline pour r\u00e9ellement traiter notre fichier :<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"982\" height=\"436\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-77.png?resize=982%2C436&#038;ssl=1\" alt=\"\" class=\"wp-image-245\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-77.png?w=982&amp;ssl=1 982w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-77.png?resize=300%2C133&amp;ssl=1 300w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-77.png?resize=768%2C341&amp;ssl=1 768w\" sizes=\"auto, (max-width: 982px) 100vw, 982px\" \/><figcaption>Pipeline ex\u00e9cut\u00e9 <\/figcaption><\/figure>\n\n\n\n<p>Si tout s&rsquo;est bien pass\u00e9, apr\u00e8s le succ\u00e8s de notre pipeline, nous pouvons naviguer dans notre nouveau dataset partitionn\u00e9 :<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"407\" height=\"474\" src=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-78.png?resize=407%2C474&#038;ssl=1\" alt=\"\" class=\"wp-image-246\" srcset=\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-78.png?w=407&amp;ssl=1 407w, https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2022\/08\/image-78.png?resize=258%2C300&amp;ssl=1 258w\" sizes=\"auto, (max-width: 407px) 100vw, 407px\" \/><figcaption>Dataset partitionn\u00e9<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Le partitionnement d&rsquo;un dataset pour optimiser sa lecture future n&rsquo;est pas compliqu\u00e9 \u00e0 r\u00e9aliser techniquement, mais demande cependant une phase de r\u00e9flexion pour la d\u00e9finition de nos partitions.<\/p>\n\n\n\n<p>Cet article avait pour but de d\u00e9crire le partitionnement en utilisant des dataflow Synapse, mais il est \u00e9videmment possible de cr\u00e9er ce type de partitionnement diff\u00e9remment notamment avec Spark directement.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Pour_aller_plus_loin\"><\/span>Pour aller plus loin<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Pour utiliser ce nouveau dataset de fa\u00e7on optimal avec un pool serverless vous pouvez maintenant suivre l&rsquo;excellente vid\u00e9o avec Stijn Wynants (@Azure_Synapse) et Filip Popovic (@SQLStijn) sur le sujet : <a href=\"https:\/\/youtu.be\/UT3Rj6Jfh4U\" target=\"_blank\" rel=\"noreferrer noopener\">Synapse Espresso: Partitioning<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Afin d&rsquo;optimiser des les requ\u00eates effectu\u00e9s sur des fichiers parquets depuis un pool serverless, on travail g\u00e9n\u00e9ralement avec des dataset partitionn\u00e9. Cette article d\u00e9crit comment g\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analystics.<\/p>\n","protected":false},"author":2,"featured_media":252,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[76],"tags":[10,9,32,8,30],"class_list":["post-215","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-synapsedatalake","tag-azure","tag-azure-synapse-analytics","tag-datalake","tag-microsoft","tag-parquet"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics - NiceData<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics - NiceData\" \/>\n<meta property=\"og:description\" content=\"Afin d&#039;optimiser des les requ\u00eates effectu\u00e9s sur des fichiers parquets depuis un pool serverless, on travail g\u00e9n\u00e9ralement avec des dataset partitionn\u00e9. Cette article d\u00e9crit comment g\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analystics.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/\" \/>\n<meta property=\"og:site_name\" content=\"NiceData\" \/>\n<meta property=\"article:published_time\" content=\"2022-09-01T06:52:07+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-08-20T12:49:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1777\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jean-Laurent Ferralis\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@JLFerralis\" \/>\n<meta name=\"twitter:site\" content=\"@JLFerralis\" \/>\n<meta name=\"twitter:label1\" content=\"\u00c9crit par\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jean-Laurent Ferralis\" \/>\n\t<meta name=\"twitter:label2\" content=\"Dur\u00e9e de lecture estim\u00e9e\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/\"},\"author\":{\"name\":\"Jean-Laurent Ferralis\",\"@id\":\"https:\/\/www.nicedata.fr\/#\/schema\/person\/8d1ad38004d3b0cf6bff7c200c795e19\"},\"headline\":\"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics\",\"datePublished\":\"2022-09-01T06:52:07+00:00\",\"dateModified\":\"2024-08-20T12:49:18+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/\"},\"wordCount\":1852,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/www.nicedata.fr\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777\",\"keywords\":[\"Azure\",\"Azure Synapse Analytics\",\"datalake\",\"Microsoft\",\"parquet\"],\"articleSection\":[\"Azure Synapse - Datalake\"],\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/\",\"url\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/\",\"name\":\"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics - NiceData\",\"isPartOf\":{\"@id\":\"https:\/\/www.nicedata.fr\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777\",\"datePublished\":\"2022-09-01T06:52:07+00:00\",\"dateModified\":\"2024-08-20T12:49:18+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#breadcrumb\"},\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#primaryimage\",\"url\":\"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777\",\"contentUrl\":\"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777\",\"width\":2560,\"height\":1777,\"caption\":\"Photo par Jan Antonin Kolar sur Unsplash\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Accueil\",\"item\":\"https:\/\/www.nicedata.fr\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.nicedata.fr\/#website\",\"url\":\"https:\/\/www.nicedata.fr\/\",\"name\":\"NiceData\",\"description\":\"L&#039;expertise Data du sud\",\"publisher\":{\"@id\":\"https:\/\/www.nicedata.fr\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.nicedata.fr\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.nicedata.fr\/#organization\",\"name\":\"NiceData\",\"url\":\"https:\/\/www.nicedata.fr\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/www.nicedata.fr\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/11\/NICE-DATA_JLFMod.webp?fit=2493%2C1249&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/11\/NICE-DATA_JLFMod.webp?fit=2493%2C1249&ssl=1\",\"width\":2493,\"height\":1249,\"caption\":\"NiceData\"},\"image\":{\"@id\":\"https:\/\/www.nicedata.fr\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/JLFerralis\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.nicedata.fr\/#\/schema\/person\/8d1ad38004d3b0cf6bff7c200c795e19\",\"name\":\"Jean-Laurent Ferralis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g\",\"caption\":\"Jean-Laurent Ferralis\"},\"description\":\"French Data Professionnal - BI consultant and #sql lover. I also #swimbikerun when possible ! Living in @villedenice\",\"sameAs\":[\"http:\/\/xp-it.com\"],\"url\":\"https:\/\/www.nicedata.fr\/index.php\/author\/jlf\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics - NiceData","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/","og_locale":"fr_FR","og_type":"article","og_title":"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics - NiceData","og_description":"Afin d'optimiser des les requ\u00eates effectu\u00e9s sur des fichiers parquets depuis un pool serverless, on travail g\u00e9n\u00e9ralement avec des dataset partitionn\u00e9. Cette article d\u00e9crit comment g\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analystics.","og_url":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/","og_site_name":"NiceData","article_published_time":"2022-09-01T06:52:07+00:00","article_modified_time":"2024-08-20T12:49:18+00:00","og_image":[{"height":1777,"width":2560,"url":"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777","type":"image\/jpeg"}],"author":"Jean-Laurent Ferralis","twitter_card":"summary_large_image","twitter_creator":"@JLFerralis","twitter_site":"@JLFerralis","twitter_misc":{"\u00c9crit par":"Jean-Laurent Ferralis","Dur\u00e9e de lecture estim\u00e9e":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#article","isPartOf":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/"},"author":{"name":"Jean-Laurent Ferralis","@id":"https:\/\/www.nicedata.fr\/#\/schema\/person\/8d1ad38004d3b0cf6bff7c200c795e19"},"headline":"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics","datePublished":"2022-09-01T06:52:07+00:00","dateModified":"2024-08-20T12:49:18+00:00","mainEntityOfPage":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/"},"wordCount":1852,"commentCount":1,"publisher":{"@id":"https:\/\/www.nicedata.fr\/#organization"},"image":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#primaryimage"},"thumbnailUrl":"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777","keywords":["Azure","Azure Synapse Analytics","datalake","Microsoft","parquet"],"articleSection":["Azure Synapse - Datalake"],"inLanguage":"fr-FR","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/","url":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/","name":"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics - NiceData","isPartOf":{"@id":"https:\/\/www.nicedata.fr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#primaryimage"},"image":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#primaryimage"},"thumbnailUrl":"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777","datePublished":"2022-09-01T06:52:07+00:00","dateModified":"2024-08-20T12:49:18+00:00","breadcrumb":{"@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#breadcrumb"},"inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#primaryimage","url":"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777","contentUrl":"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777","width":2560,"height":1777,"caption":"Photo par Jan Antonin Kolar sur Unsplash"},{"@type":"BreadcrumbList","@id":"https:\/\/www.nicedata.fr\/index.php\/2022\/09\/01\/generer-un-dataset-partitionne-via-un-data-flow-azure-synapse-analytics\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Accueil","item":"https:\/\/www.nicedata.fr\/"},{"@type":"ListItem","position":2,"name":"G\u00e9n\u00e9rer un dataset partitionn\u00e9 via un data flow Azure Synapse Analytics"}]},{"@type":"WebSite","@id":"https:\/\/www.nicedata.fr\/#website","url":"https:\/\/www.nicedata.fr\/","name":"NiceData","description":"L&#039;expertise Data du sud","publisher":{"@id":"https:\/\/www.nicedata.fr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.nicedata.fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/www.nicedata.fr\/#organization","name":"NiceData","url":"https:\/\/www.nicedata.fr\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/www.nicedata.fr\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/11\/NICE-DATA_JLFMod.webp?fit=2493%2C1249&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.nicedata.fr\/wp-content\/uploads\/2024\/11\/NICE-DATA_JLFMod.webp?fit=2493%2C1249&ssl=1","width":2493,"height":1249,"caption":"NiceData"},"image":{"@id":"https:\/\/www.nicedata.fr\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/JLFerralis"]},{"@type":"Person","@id":"https:\/\/www.nicedata.fr\/#\/schema\/person\/8d1ad38004d3b0cf6bff7c200c795e19","name":"Jean-Laurent Ferralis","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/3cbfdae273d44fb82b902a3451eb0db37485119e34cb70c1fdd186c9b731b9f3?s=96&d=mm&r=g","caption":"Jean-Laurent Ferralis"},"description":"French Data Professionnal - BI consultant and #sql lover. I also #swimbikerun when possible ! Living in @villedenice","sameAs":["http:\/\/xp-it.com"],"url":"https:\/\/www.nicedata.fr\/index.php\/author\/jlf\/"}]}},"jetpack_featured_media_url":"https:\/\/images.unsplash.com\/photo-1544383835-bda2bc66a55d?ixid=MnwzNTY4Mjl8MHwxfGFsbHx8fHx8fHx8fDE2NjE5MzU1MTQ&ixlib=rb-1.2.1&fm=jpg&q=85&fit=crop&w=2560&h=1777","jetpack-related-posts":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/posts\/215","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/comments?post=215"}],"version-history":[{"count":14,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/posts\/215\/revisions"}],"predecessor-version":[{"id":256,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/posts\/215\/revisions\/256"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/media\/252"}],"wp:attachment":[{"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/media?parent=215"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/categories?post=215"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nicedata.fr\/index.php\/wp-json\/wp\/v2\/tags?post=215"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}