Sankey Plots

Introduction

Sankey Plots are visualization tools that are used to represent flow between two or more entities. They represent the flow of entities hence are ideal for the representation of datasets consisting of energy or financial data.

Let's dive into the fundamentals and key features of Sankey Plots with the help of some code examples covering a range of different datasets.


Key Features & Explanation

Sankey diagrams consist of three key components:

Here, Nodes are placed at different positions based on their hierarchy and the width of the link represents the value of the flow between the nodes.

Key Components of Sankey Plot

How to create Sankey diagrams

Online Tools

Online tools like SankeyMATIC and Displayr are very useful in creating Sankey plots.

Python

To create Sankey plots there are several python libraries like matplotlib, Plotly, and holoviews. We will be using on Plotly to create Sankey diagrams.


Installation & Setup for Plotly

pip install plotly

Now, import the necessary modules:

import plotly.graph_objects as go

Code Examples

Example 1: Manually Inserted Values

import plotly.graph_objects as go
import plotly.express as px
labels = [
    "Total Tax Collected", "Infrastructure", "Defense", "Education", "Healthcare",
    "Subsidies & Welfare", "Interest Payments on Debt", "Others",
    "Roads & Highways", "Railways", "Urban Development",
    "Army", "Navy", "Air Force",
    "Primary & Secondary Education", "Higher Education",
    "Public Hospitals", "Vaccination Programs",
    "Food & Agriculture Subsidies", "Rural Employment", "Pension & Social Welfare",
    "Law Enforcement, Admin, etc."
]
        
sources = [0, 0, 0, 0, 0, 0, 0,
            1, 1, 1,
            2, 2, 2,
            3, 3,
            4, 4,
            5, 5, 5,
            6]
        
targets = [1, 2, 3, 4, 5, 6, 7,
            8, 9, 10,
            11, 12, 13,
            14, 15,
            16, 17,
            18, 19, 20,
            21]

values = [30, 15, 12, 10, 18, 10, 5,
            12, 8, 10,
            8, 3, 4,
            7, 5,
            6, 4,
            8, 6, 4,
            5]

color_palette = px.colors.qualitative.Set2
link_colors = [color_palette[i % len(color_palette)] for i in sources]

fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=20, 
        thickness=30, 
        line=dict(color="black", width=0.5),
        label=labels,
        color="lightgray",
    ),
    link=dict(
        source=sources,
        target=targets,
        value=values,
        color=link_colors
    )
)])

fig.update_layout(
    title_text="India's Tax Revenue Allocation (Sample Data)",
    font_size=12,
    width=1000,
    height=700
)

fig.show()    

Explanation of Code:

Plot 1

Plot 1

Example 2:Using CSV File

import pandas as pd
import plotly.graph_objects as go

df = pd.read_csv("students_career_journey.csv")

nodes = list(set(df["Source"]).union(set(df["Target"])))
node_dict = {name: i for i, name in enumerate(nodes)}

sources = df["Source"].map(node_dict)
targets = df["Target"].map(node_dict)
values = df["Value"]

node_colors = [
"rgba(255, 99, 132, 0.9)",
"rgba(54, 162, 235, 0.9)",
"rgba(255, 206, 86, 0.9)",
"rgba(75, 192, 192, 0.9)",
"rgba(153, 102, 255, 0.9)",
"rgba(255, 159, 64, 0.9)",
"rgba(255, 69, 0, 0.9)",
"rgba(0, 255, 127, 0.9)",
"rgba(30, 144, 255, 0.9)",
"rgba(128, 0, 128, 0.9)",
"rgba(0, 206, 209, 0.9)",
] * (len(nodes) // 10 + 1)

link_colors = [
f"rgba({max(0, 255 - i7)}, {min(255, 100 + i4)}, {max(0, 200 - i*5)}, 0.6)"
for i in range(len(values))
]

node_colors = node_colors[:len(nodes)]
link_colors = link_colors[:len(values)]

fig = go.Figure(go.Sankey(
node=dict(
pad=20, thickness=20, line=dict(color="black", width=0.5),
label=nodes, color=node_colors
),
link=dict(
source=sources, target=targets, value=values,
color=link_colors
)
))

fig.update_layout(
title_text=" Complex Career Journey Sankey",
font=dict(size=12, color="white"),
plot_bgcolor="black",
paper_bgcolor="black"
)

fig.show()

Plot 2:

Plot 2

Sankey Diagram using Sankeymatic

Interface of SankeyMATIC

Interface

Steps for using SankeyMATIC


Code Examples

Text file:

High School[250]Science Stream  
High School[180]Commerce Stream  
High School[120]Arts Stream  
Science Stream[150]Engineering  
Science Stream[70]Medicine  
Science Stream[10]Research  
Science Stream[20]Dropped Out  
Commerce Stream[100]Business  
Commerce Stream[50]Finance  
Commerce Stream[20]Law  
Commerce Stream[10]Dropped Out  
Arts Stream[60]Media  
Arts Stream[40]Teaching  
Arts Stream[15]Government Jobs  
Arts Stream[5]Dropped Out  
Engineering[90]Tech Jobs  
Engineering[40]Startup  
Engineering[10]Higher Studies  
Engineering[10]Government Jobs  
Medicine[60]Doctor  
Medicine[5]Research  
Medicine[5]Dropped Out  
Business[80]Corporate  
Business[15]Entrepreneurship  
Business[5]Dropped Out  
Finance[25]Banking  
Finance[15]Investment  
Finance[5]Dropped Out  
Media[30]Journalism  
Media[20]Content Creation  
Media[10]Advertising  
Teaching[25]School Teacher  
Teaching[10]Professor  
Teaching[5]Dropped Out  
Government Jobs[15]Civil Services  
Government Jobs[10]Defense  
Tech Jobs[60]Software Engineer  
Tech Jobs[20]Data Science  
Tech Jobs[10]Freelancing  
Startup[10]Success  
Startup[30]Failure  
Higher Studies[5]PhD  
Higher Studies[5]MBA

On further improving the sankey plot's visual components, we get the following plot:

Plot:

Plot

Use Cases

Sankey diagrams are widely used in various fields, including:


Conclusion

Sankey diagrams are great visualization tools for a range of datasets. These diagrams are the best at representing complex flows and relationships. In energy management, Sankey plots can visualize energy flows and losses in systems, helping to identify areas for efficiency improvements. In marketing, Sankey diagrams are used to track customers' investments in different modes. Financial analysts use them to break down income statements and visualize cash flows. For Supply Chain Management, they help in optimizing each stage of the process thus improving efficiency.Hence, their ability to represent such complex data relationships makes them powerful tools for data analysis and visualization across numerous fields.

References & Further Reading