Get faster pandas with Modin
The modin.pandas DataFrame is an extremely lightweight, robust DataFrame.
Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical.
Modin currently supports 93% of the pandas API based on a study of pandas usage.
Why Modin?
Modin uses Ray or to provide an effortless way to speed up your pandas notebooks, scripts, and libraries.
modin.pandas provide speed-ups of up to 4x on a laptop with 4 physical cores.
In pandas, you are only able to use one core at a time when you are doing computation of any kind. With modin.pandas, you are able to use all of the CPU cores on your machine.
Once you’ve changed your import statement, you’re ready to use modin.pandas just like you would pandas.
1. Installation of Modin :
- Installing with pip :
pip install -U modin
- Installing from the GitHub master branch :
pip install git+https://github.com/modin-project/modin
If you don’t have or installed, you will need to install Modin with one of the targets:
# Install Modin dependencies and Ray to run on Ray
pip install modin[ray]
# Install Modin dependencies and Dask to run on Dask
pip install modin[dask]
# Install all of the above
pip install modin[all]
Note: For installation on Windows, we recommend using the Engine. Ray does not support Windows, so it will not be possible to install modin[ray] or modin[all]. It is possible to use Windows Subsystem For Linux (WSL), but this is generally not recommended due to the limitations and poor performance of Ray on WSL, a roughly 2–3x cost.
2. Importing Modin Pandas :
import modin.pandas as pd
3. Reading Dataset :
df = pd.read_csv("example.csv")
# Top 5 Records: df.head()
# Last 5 Records: df.tail()
# Shape of dataset: df.shape
# Data-types: df.dtypes
# Description of dataset :df.describe()
# Check NaN values present or not :df.isnull().sum()
# Dropping Column :df.drop("feature_name", axis=1, inplace=True)
# Columns : df.columns
# Unique Values in columns :df["feature_name"].unique()
-
-----------------------These are some of the basic operations.------------------------
Conclusions :
Modin is an early-stage DataFrame library that wraps pandas and transparently distributes the data and computation, accelerating your pandas workflows with one line of code change. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous pandas notebooks while experiencing a considerable speedup from Modin, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use Modin just like you would pandas, since the API is identical to pandas.
Reference:
Pandas: https://pandas.pydata.org/docs/
Modin Pandas: https://modin.readthedocs.io/en/latest/index.html
— — — — ——THANK YOU — — — — — —
Originally published at https://inblog.in.