In part III, we use the bar_chart_race package to create animated chart for our data set.

Loading File
Data Cleaning
Monthly Posts v.s. Cumulative Sum
Animated Bar Chart with Bar-Chart-Race
- Customization: show total work count
- Animation

Loading File

In part I and part II, we prepared and saved the DataFrame to local csv files. We’ll load the two files here.

# Load python libraries
import pandas as pd

# Load rating.csv from part I
rating = pd.read_csv("rating.csv")

# preview file
rating

	id	type	name	canonical	cached_count	merger_id
0	9	Rating	Not Rated	True	825385	NaN
1	10	Rating	General Audiences	True	2115153	NaN
2	11	Rating	Teen And Up Audiences	True	2272688	NaN
3	12	Rating	Mature	True	1151260	NaN
4	13	Rating	Explicit	True	1238331	NaN
5	12766726	Rating	Teen & Up Audiences	False	333	NaN

# Load rating_pivot.csv from part II
df = pd.read_csv("rating_pivot.csv")

# preview file
df

	creation date	9	10	11	12	13
0	2008-09-30	76	232	213	174	233
1	2008-10-31	38	111	93	43	196
2	2008-11-30	11	97	97	56	76
3	2008-12-31	2	93	47	41	56
4	2009-01-31	18	175	104	78	133
...	...	...	...	...	...	...
145	2020-10-31	14188	42416	47706	22015	28723
146	2020-11-30	13397	38003	42168	19005	21743
147	2020-12-31	15763	50443	51435	22664	26656
148	2021-01-31	16875	45592	51099	23830	27084
149	2021-02-28	15863	42624	46716	23610	25034

150 rows × 6 columns

Data Cleaning

There are still some data cleaning to do, namely:

Making the “creation date” column as index;
The “creation date” column shows which month the data was collected, however it includes the last day of the month in the string, and should be corrected;
Changing the column name from tag id to tag name.

# Set index
df.set_index("creation date", inplace=True)
df

	9	10	11	12	13
creation date
2008-09-30	76	232	213	174	233
2008-10-31	38	111	93	43	196
2008-11-30	11	97	97	56	76
2008-12-31	2	93	47	41	56
2009-01-31	18	175	104	78	133
...	...	...	...	...	...
2020-10-31	14188	42416	47706	22015	28723
2020-11-30	13397	38003	42168	19005	21743
2020-12-31	15763	50443	51435	22664	26656
2021-01-31	16875	45592	51099	23830	27084
2021-02-28	15863	42624	46716	23610	25034

150 rows × 5 columns

# Remove day from date string
# Use .str to access the string on each row
df.index = df.index.str[:-3]
df

	9	10	11	12	13
creation date
2008-09	76	232	213	174	233
2008-10	38	111	93	43	196
2008-11	11	97	97	56	76
2008-12	2	93	47	41	56
2009-01	18	175	104	78	133
...	...	...	...	...	...
2020-10	14188	42416	47706	22015	28723
2020-11	13397	38003	42168	19005	21743
2020-12	15763	50443	51435	22664	26656
2021-01	16875	45592	51099	23830	27084
2021-02	15863	42624	46716	23610	25034

150 rows × 5 columns

# Change tag id to tag name
# We ditched tag id 13 because it's a duplicate
df.columns = rating.name[:5]
df

name	Not Rated	General Audiences	Teen And Up Audiences	Mature	Explicit
creation date
2008-09	76	232	213	174	233
2008-10	38	111	93	43	196
2008-11	11	97	97	56	76
2008-12	2	93	47	41	56
2009-01	18	175	104	78	133
...	...	...	...	...	...
2020-10	14188	42416	47706	22015	28723
2020-11	13397	38003	42168	19005	21743
2020-12	15763	50443	51435	22664	26656
2021-01	16875	45592	51099	23830	27084
2021-02	15863	42624	46716	23610	25034

150 rows × 5 columns

Monthly Posts v.s. Cumulative Sum

We have two options here. The DataFrame contains the number of works posted per month under each rating category. We can also calculate the cumulative sum of posts. The end result should be very close to the total number of works on AO3 at the time of the data dump. Remember, in previous posts, as we were cleaning the data set, we made decisions to drop some works from the data set due to N/A values or duplicates.

# Cumulative sum
df_cumsum = df.cumsum()
df_cumsum

name	Not Rated	General Audiences	Teen And Up Audiences	Mature	Explicit
creation date
2008-09	76	232	213	174	233
2008-10	114	343	306	217	429
2008-11	125	440	403	273	505
2008-12	127	533	450	314	561
2009-01	145	708	554	392	694
...	...	...	...	...	...
2020-10	651539	1895727	2006772	1014066	1081494
2020-11	664936	1933730	2048940	1033071	1103237
2020-12	680699	1984173	2100375	1055735	1129893
2021-01	697574	2029765	2151474	1079565	1156977
2021-02	713437	2072389	2198190	1103175	1182011

150 rows × 5 columns

# Export to local csv file
df_cumsum.to_csv("rating-cumsum.csv")

Animated Bar Chart with Bar-Chart-Race

We use the bar_chart_race package to automate the animation process. You can of course create the whole chart from scratch like this. More tutorials about the package from the author can be found here.

# Load the package
import bar_chart_race as bcr

# Load gc to manually release memory in case Jupyter Notebook crashes
import gc

# Clear memory
gc.collect()

Customization: show total work count

# Function to show total work count
# From bar-chart-race tutorial
def summary(values, ranks):
    total_works = values.sum()
    s = f'Total Works - {total_works:,.0f}'
    return {'x': .99, 'y': .05, 's': s, 'ha': 'right', 'size': 8}

Animation

# filename=None in order to display in Jupyter Notebook cell
# period_summary_func=summary to show total work count
bcr.bar_chart_race(df=df_cumsum, filename=None, period_summary_func=summary, title='AO3 Works Rating Breakdown \n 2008-2021')

/home/pi/.local/lib/python3.7/site-packages/bar_chart_race/_make_chart.py:286: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_yticklabels(self.df_values.columns)
/home/pi/.local/lib/python3.7/site-packages/bar_chart_race/_make_chart.py:287: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels([max_val] * len(ax.get_xticks()))