Data Practice 5

Data Practice 5

Introduciton

One site requires SMS validation after registration.

Email SMS
time time
user_id user_id
email_id text_id
action

Questions

  • Get daily registration count.
  • Get the number of registered users who passed the SMS validation
  • Get the number of people who validated on next day.
  • Get the avarage time between registration to validation.

Answers

  • Follow up question: does one user only have one registration email?
1
2
3
# Get daily registration count
email_df.loc[:, 'reg_date'] = email_df.time.dt.date
daily_reg_count = email_df.groupby('reg_date').user_id.count()
1
2
3
4
5
# Get the number of registered users who passed the SMS validation
validated_sms_df = sms_df.loc[sms_df.action == 'validated', :]
joined_df = email_df.merge(validated_sms_df, on='user_id', how='left')
validated_user = joined_df.loc[joined_df.action.notnull(), :]
validated_user_count = validated_user.user_id.count()
1
2
3
4
5
6
# Get the number of people who validated on next day.
email_df.loc[:, 'reg_date'] = email_df.time.dt.date
validated_sms_df = sms_df.loc[sms_df.action == 'validated', :].copy()
validated_sms_df.loc[:, 'validate_date'] = validated_sms_df.time.dt.date
joined_df = validated_sms_df.merge(email_df, on='user_id', how='left')
next_day_validation_count = joined_df.loc[(joined_df.validate_date - joined_df.reg_date) == pd.Timedelta(days=1), :].user_id.count()
1
2
# Get the avarage time between registration to validation.
avg_validation_wait = (joined_df.time_x - joined_df.time_y).mean()

Data Preperation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import pandas as pd
email_df = pd.DataFrame(
{
'time': pd.to_datetime(['2018-05-01 08:00', '2018-05-02 08:20', '2018-05-02 08:50']),
'user_id': ['001', '002', '003'],
'email_id': ['1@a', '2@b', '3@c']
}
)

sms_df = pd.DataFrame(
{
'time': pd.to_datetime(['2018-05-01 08:05', '2018-05-02 08:25', '2018-05-03 01:12']),
'user_id': ['001', '002', '002'],
'text_id': ['1', '2', '3'],
'action': ['validated', 'failed', 'validated']
}
)