Brett Byrnes

Analyze My Running Data on AWS - Part 2

Analyzing my running data on AWS

In the first post of this series, I took Garmin .fit files from about 7 years of running, converted them to .csv and put them into a MySQL database using AWS services like Lambda, SQS, and RDS.

I've gone ahead and created some tables in MySQL that do some aggregations around run statistics and heart rate data. I'll use these tables to pull in the data that I'll use to analyze my runs in this post and others.

The first thing I want to know is what my weekly mileage has been from 2019 and beyond. To do this, I'm going to query my MySQL database from the run_statistics table. Then I'll merge that with a dataframe consisting of every date including and between the min/max of the run_statistics table so that I can include weeks where there weren't any runs.

Finally I'll graph the weekly milage to see how much I've been running.

import pandas as pd
rs_query = 'SELECT * FROM garmin.run_statistics'
rs_df = query(rs_query)
rs_df['run_date'] = pd.to_datetime(rs_df['run_date'])
rs_df.head()

Define a query function - I've left out secrets that I use in AWS.

import pymysql
def query(q):

  _host = ''
  _port = 3306
  _user = ''
  _pass = ''
  _database = ''

  conn = pymysql.connect(host = _host,
  port = _port,
  user = _user,
  password = _pass,
  db = _database)

  query = q
  df = pd.read_sql(query, conn)
  return df

Now execute the query to get the aggregated run statistics.

The output for 2019 and beyond is below. I did a fair amount of running through the summer, and it trickled off in the fall. I've increased my mileage and runs as the weather in 2020 has gotten better and we've been under Coronavirus "shelter in place".

The code to produce this is below.

# Take the individual run statistics and merge them for missing days
# and then calculate the weekly statistics
def weekly_mileage(rs_df,dates_column='run_date',distance_column='distance'):
  min_date = rs_df[dates_column].min()
  max_date = rs_df[dates_column].max()
  all_dates = pd.date_range(min_date,max_date)

  dates_df = pd.DataFrame(all_dates,columns=[dates_column])
  weekly_rs_df = pd.merge(left=dates_df,right=rs_df,how='outer',on='run_date',left_index=False,right_index=False) 

  weekly_rs_df['week'] = weekly_rs_df[dates_column].dt.week
  weekly_rs_df['year'] = weekly_rs_df[dates_column].dt.year

  weekly_gb_df = weekly_rs_df[weekly_rs_df['year'] >= 2019].groupby(['year','week'])

  fig, axes = plt.subplots(figsize=(16,4))
  weekly_summed = weekly_gb_df[distance_column].sum()/1609.34
  weekly_summed.plot(kind='bar',color='purple')
  axes.set_title('Weekly Mileage (miles)')

  # upload information # fname = 'weekly_mileage.jpg'
  bucket = ''
  prefix = 'charts/'
  plt.savefig(fname)
  s3_client = boto3.client('s3')
  s3_client.upload_file(fname, bucket, prefix+fname)

weekly_mileage(rs_df)

I'll apply some more analysis to it in future posts.