Moving averages - how to calculate them in Pandas?
- staniszradek
- 10 gru 2024
- 3 minut(y) czytania
In the world of data analysis, especially when dealing with time-series data, identifying trends can be challenging due to fluctuations and/or noise in the data. Moving Average is a powerful statistical technique that helps smooth out these fluctuations and makes patterns easier to spot.
What is a moving average?
We calculate moving averages to analyze data points by creating averages of different subsets of the dataset. It helps to smooth short-term fluctuations and highlight longer-term trends or cycles. For example, if you're analyzing stock prices, a moving average can give you a better understanding of the stock's overall trend over a specified period (e.g., 50 days, 200 days).
Types of Moving Averages:
Simple Moving Average (SMA): The average of a fixed number of previous data points.
Weighted Moving Average (WMA): Assigns different weights to data points, usually giving more importance to recent points.
Exponential Moving Average (EMA): Gives exponentially more weight to recent data points compared to older ones.
When it comes to Simple Moving Average (SMA) we simply take the arithmetic mean of the last N data points. The formula along with the example of SMA calculation can be viewed here.
Let's now focus on how to calculate SMA in Pandas. First, we need some sample dataset. Let's generate some random values from range 100+-20 for the last month (30 days):
data = {
"Date": pd.date_range(start="2024-11-10", periods=30, freq="D"),
"Sales": [100 + np.random.randint(-20, 20) for _ in range(30)]
}
df = pd.DataFrame(data)
df.set_index("Date", inplace=True)
Now our dataframe looks like this:

We have 30 data points (last month) representing some sales. Let's say we want to calculate 5-day Simple Moving Average to spot if there's any trend. To do this we'll employ rolling method, which groups data in our case every 5 consecutive days:
df["SMA_5"] = df["Sales"].rolling(window=5).mean()
As a result we get:

We can see the first 4 rows of SMA_5 column have NaNs as there's no previous data points to calculate 5-day SMA for these observations. The first day we can calculate 5-day SMA is 14th of November. Let's quickly check whether the calculation is correct:
SMA = (109+110+88+84+85) / 5 = 476/5 = 95.2
Alright, it looks like our moving average is calculated correctly using rolling function. Let's now visualize it.
plt.plot(df.index, df["Sales"], label="Daily Sales")
plt.plot(df.index, df["SMA_5"], label="5-Day Moving Average", color="orange")
plt.title("Daily Sales vs. 5-Day Moving Average")
plt.xlabel("Date")
plt.ylabel("Sales")
plt.legend()
plt.grid()
plt.show()

Looking at the blue line which corresponds to Daily Sales, we can see it is very chaotic and goes up and down every few days. It's hard to tell whether sales are in some trend. On the other hand, when we look at the orange line which represents our 5-day SMA, we immidiately see that starting from 17th of November sales have been in a slight upward trend.
We need to keep in mind that SMA treats all data points equally, making it simple but less sensitive to recent changes. If we need to be more responsive to changes, we should consider Weighted Moving Average (WMA) or Exponential Moving Average (EMA). In WMA, more recent data points are given higher weights while EMA applies weights exponentially.
Let's see how to calculate EMA in Pandas:
df["EMA_5"] = df["Sales"].ewm(span=5, adjust=False).mean()

This time we didn't get any NaNs, because EMAs are calculated according to a different formula than SMAs. Here's the formula.
Let's take the 14th of November as an example again and check what the ewm function really does.
Since we are interested in calculating 5-day EMA, the value of the smoothing factor (alpha ) will be: 2/(5+1) = 0.33(3)
So if we plug all numbers into the formula we get:
EMA(14th Nov) = 0.33(3) * 85 + (1 - 0.33(3)) * EMA(13th Nov) = 28.33(3) + 64,098765(3) = 92,432099
We received the same result as ewm function did. Finally, let's compare visually the difference between SMA and EMA:

Let's take a look at the data point from 22 of November. Sales were equal to 112 on that day. Both SMA and EMA were roughly the same (98.4).
The next day, the price dropped significantly and reached 98. What happened to SMA? It increased by 3 (despite the significant price drop) because it treats last 5 days equally while EMA decreased by roughly 0.13 as it gives exponentially more weight to recent data points.
To sum up, calculating moving averages can be really useful technique to smooth data and identify trends. Pandas library makes it pretty straightforward with the .rolling() and ewm() methods. Simple moving averages are more stable and not as responsive to sudden changes as exponential ones, however, they're still very good at identifying the overall trend.
Comments