naxdive.blogg.se - Pfs explorer rename

groups variable is a dictionary whose keys are the computed unique groups and corresponding values being the axis labels belonging to each group. The groupby() function returns a GroupBy object, but essentially describes how the rows of the original data set has been split. For example, the expression oupby(‘month’) will split our current DataFrame by month. Groupby essentially splits the data into different groups depending on a variable of your choice. There’s further power put into your hands by mastering the Pandas “groupby()” functionality. The describe() output varies depending on whether you apply it to a numeric or character column. describe() function is a useful summarisation tool that will quickly display statistics for any variable or group it is applied to. The full range of basic statistics that are quickly calculable and built into the base Pandas package are: Function The need for custom functions is minimal unless you have very specific requirements. # Number of non-null unique network entries

# How many entries are there for each month? # How many seconds of phone calls are recorded in total?ĭata = 'call'].sum() # What was the longest phone call / data entry? For example, mean, max, min, standard deviations and more for columns are easily calculable: # How many rows the dataset Once the data has been loaded into Python, Pandas makes the calculation of different statistics very simple. import pandas as pdĭata = pd.om_csv('phone_data.csv')ĭata = data.apply(, dayfirst=True) Summarising the DataFrame Phone numbers were removed for privacy. The date column can be parsed using the extremely handy dateutil library.

network_type: Whether the number being called was a mobile, international (‘world’), voicemail, landline, or other (‘special’) number.

network: The mobile network that was called/texted for each entry.

month: The billing month that each entry belongs to – of form ‘YYYY-MM’.

item: A description of the event occurring – can be one of call, sms, or data.

duration: The duration (in seconds) for each call, the amount of data (in MB) for each data entry, and the number of texts sent (usually 1) for each sms entry.

Sample CSV file data containing the dates and durations of phone calls made on my mobile phone. The CSV file can be loaded into a pandas DataFrame using the _csv() function, and looks like this: The dataset contains 830 entries from my mobile phone log spanning a total time of 5 months. If you’d like to follow along – the full csv file is available here.

I analysed this type of data using Pandas during my work on KillBiller. In order to demonstrate the effectiveness and simplicity of the grouping commands, we will need some data. For an example dataset, I have extracted my own mobile phone usage records.