Sankey Plots are visualization tools that are used to represent flow between two or more entities. They represent the flow of entities hence are ideal for the representation of datasets consisting of energy or financial data.
Let's dive into the fundamentals and key features of Sankey Plots with the help of some code examples covering a range of different datasets.
Sankey diagrams consist of three key components:
Here, Nodes are placed at different positions based on their hierarchy and the width of the link represents the value of the flow between the nodes.
Online tools like SankeyMATIC and Displayr are very useful in creating Sankey plots.
To create Sankey plots there are several python libraries like matplotlib, Plotly, and holoviews. We will be using on Plotly to create Sankey diagrams.
pip install plotly
Now, import the necessary modules:
import plotly.graph_objects as go
import plotly.graph_objects as go import plotly.express as px labels = [ "Total Tax Collected", "Infrastructure", "Defense", "Education", "Healthcare", "Subsidies & Welfare", "Interest Payments on Debt", "Others", "Roads & Highways", "Railways", "Urban Development", "Army", "Navy", "Air Force", "Primary & Secondary Education", "Higher Education", "Public Hospitals", "Vaccination Programs", "Food & Agriculture Subsidies", "Rural Employment", "Pension & Social Welfare", "Law Enforcement, Admin, etc." ] sources = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5, 6] targets = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21] values = [30, 15, 12, 10, 18, 10, 5, 12, 8, 10, 8, 3, 4, 7, 5, 6, 4, 8, 6, 4, 5] color_palette = px.colors.qualitative.Set2 link_colors = [color_palette[i % len(color_palette)] for i in sources] fig = go.Figure(data=[go.Sankey( node=dict( pad=20, thickness=30, line=dict(color="black", width=0.5), label=labels, color="lightgray", ), link=dict( source=sources, target=targets, value=values, color=link_colors ) )]) fig.update_layout( title_text="India's Tax Revenue Allocation (Sample Data)", font_size=12, width=1000, height=700 ) fig.show()
import pandas as pd import plotly.graph_objects as go df = pd.read_csv("students_career_journey.csv") nodes = list(set(df["Source"]).union(set(df["Target"]))) node_dict = {name: i for i, name in enumerate(nodes)} sources = df["Source"].map(node_dict) targets = df["Target"].map(node_dict) values = df["Value"] node_colors = [ "rgba(255, 99, 132, 0.9)", "rgba(54, 162, 235, 0.9)", "rgba(255, 206, 86, 0.9)", "rgba(75, 192, 192, 0.9)", "rgba(153, 102, 255, 0.9)", "rgba(255, 159, 64, 0.9)", "rgba(255, 69, 0, 0.9)", "rgba(0, 255, 127, 0.9)", "rgba(30, 144, 255, 0.9)", "rgba(128, 0, 128, 0.9)", "rgba(0, 206, 209, 0.9)", ] * (len(nodes) // 10 + 1) link_colors = [ f"rgba({max(0, 255 - i7)}, {min(255, 100 + i4)}, {max(0, 200 - i*5)}, 0.6)" for i in range(len(values)) ] node_colors = node_colors[:len(nodes)] link_colors = link_colors[:len(values)] fig = go.Figure(go.Sankey( node=dict( pad=20, thickness=20, line=dict(color="black", width=0.5), label=nodes, color=node_colors ), link=dict( source=sources, target=targets, value=values, color=link_colors ) )) fig.update_layout( title_text=" Complex Career Journey Sankey", font=dict(size=12, color="white"), plot_bgcolor="black", paper_bgcolor="black" ) fig.show()
Upload the text in the given format:
Source[Value]Target
This will create a flow from source to target with thickness according to value given.
Text file:
High School[250]Science Stream High School[180]Commerce Stream High School[120]Arts Stream Science Stream[150]Engineering Science Stream[70]Medicine Science Stream[10]Research Science Stream[20]Dropped Out Commerce Stream[100]Business Commerce Stream[50]Finance Commerce Stream[20]Law Commerce Stream[10]Dropped Out Arts Stream[60]Media Arts Stream[40]Teaching Arts Stream[15]Government Jobs Arts Stream[5]Dropped Out Engineering[90]Tech Jobs Engineering[40]Startup Engineering[10]Higher Studies Engineering[10]Government Jobs Medicine[60]Doctor Medicine[5]Research Medicine[5]Dropped Out Business[80]Corporate Business[15]Entrepreneurship Business[5]Dropped Out Finance[25]Banking Finance[15]Investment Finance[5]Dropped Out Media[30]Journalism Media[20]Content Creation Media[10]Advertising Teaching[25]School Teacher Teaching[10]Professor Teaching[5]Dropped Out Government Jobs[15]Civil Services Government Jobs[10]Defense Tech Jobs[60]Software Engineer Tech Jobs[20]Data Science Tech Jobs[10]Freelancing Startup[10]Success Startup[30]Failure Higher Studies[5]PhD Higher Studies[5]MBAOn further improving the sankey plot's visual components, we get the following plot:
Sankey diagrams are widely used in various fields, including:
Sankey diagrams are great visualization tools for a range of datasets. These diagrams are the best at representing complex flows and relationships. In energy management, Sankey plots can visualize energy flows and losses in systems, helping to identify areas for efficiency improvements. In marketing, Sankey diagrams are used to track customers' investments in different modes. Financial analysts use them to break down income statements and visualize cash flows. For Supply Chain Management, they help in optimizing each stage of the process thus improving efficiency.Hence, their ability to represent such complex data relationships makes them powerful tools for data analysis and visualization across numerous fields.