Pandas Number of Consecutive Occurrences in Previous Rows
Problem Description
We are given an OHLC (Open, High, Low, Close) dataset with candle types that can be either ‘green’ (if the close is above open) or ‘red’ (if the open is above the close). The goal is to count the number of consecutive green or red candles for a specified number of previous rows.
Example Data
| open | close | candletype |
|---|---|---|
| 542 | 543 | GREEN |
| 543 | 544 | GREEN |
| 544 | 545 | GREEN |
| 545 | 546 | GREEN |
| 546 | 547 | GREEN |
| 547 | 542 | RED |
| 542 | 543 | GREEN |
Solution
We can use the rolling function in pandas to achieve this. However, a one-liner solution is possible.
First, let’s convert the ‘candletype’ column to a numeric value using its category and then apply the lambda function to rolling windows.
df = pd.read_clipboard()
df['code'] = df.candletype.astype('category').cat.codes
# Apply lambda function to rolling window series
df['pattern'] = df['code'].rolling(3).apply(lambda x: all(x==0)).shift()
Explanation
- We start by reading the clipboard into a pandas DataFrame using
pd.read_clipboard(). - We then convert the ‘candletype’ column to numeric values using its category. This creates a new column named ‘code’.
- Next, we apply a lambda function to rolling windows of size 3.
- The lambda function checks if all elements in the window are equal to 0 (i.e., green candles).
- If true, it returns 1; otherwise, it returns NaN.
- We then shift the result by one row using
.shift(), which fills NaN values with zeros as expected. - The resulting ‘pattern’ series contains the desired information.
Example Use Case
Let’s assume we want to identify rows preceded by exactly 3 green candles. We can achieve this by setting n=3 in the rolling function and checking for df['code'] == 1, which corresponds to a window with all elements equal to 0 (i.e., only green candles).
# Set n to 3
n = 3
# Create a mask to filter rows preceded by exactly n green candles
mask = df['code'].rolling(n).apply(lambda x: all(x==1)).shift()
# Filter rows using the mask
filtered_df = df[mask == True]
Conclusion
We have successfully implemented a pandas-based solution for counting consecutive occurrences of ‘green’ or ‘red’ candles in a specified number of previous rows. The code is concise, readable, and easy to understand, making it suitable for various applications and use cases.
The rolling window approach provides an elegant one-liner solution to this problem, allowing you to efficiently process large datasets while maintaining readability and maintainability.
Code Reuse
The rolling function in pandas can be applied to other data types and series, not just categorical values. This highlights the flexibility and versatility of this function, which makes it a valuable tool for data analysis and processing tasks.
By reusing this approach for different scenarios, you can efficiently analyze and process various datasets while maintaining consistency in your code structure.
Last modified on 2024-08-26