
Â
ããã«ã¡ã¯ãäžå¿ããŒã¿ãµã€ãšã³ãã£ã¹ãã£ãœãä»äºãããŠããŸããããã§ãâš
仿¥ã¯çããïŒïŒïŒä»äºã®è©±ãããŠã¿ããããªããšæããŸãðŸ
Â
æè¿ä»äºã§Polarsãäœ¿ãæ©äŒãå€ããPandasãšäŒŒãŠããã©ãäœ¿çšæã§éãæããããããããªïœãšæããŸããã
PolarsãåºãŠãããŸããŸãçµã£ãŠãã®ã§ä»æŽæã¯ãããã§ãããä»åã¯ç§ãªãã«äœ¿ã£ãŠã¿ãŠã©ãã ã£ããã®ææ³ããŸãšããŠã¿ãŸãã
Â
ðŒ Pandasãšã¯ïŒð§ Polarsãšã¯ïŒ
ãã£ããèšããšãã©ã¡ãããããŒã¿ãã¬ãŒã ãïŒè¡šåœ¢åŒã®ããŒã¿ïŒãæ±ãããã®Pythonã©ã€ãã©ãªã§ãã
- PandasïŒãã£ãšæããããçéã©ã€ãã©ãªãExcelãšãCSVã®åŠçãããŒã¿ã®éèšã»æŽåœ¢ãã°ã©ãäœãã«ã䜿ãããŠãŸãã
- PolarsïŒæè¿æ³šç®ãããŠãæ°ããã®ã©ã€ãã©ãªãRustããŒã¹ã§åäœããšã£ãŠãéãã®ãç¹åŸŽã
ç§ã¯æåããã£ãšPandasã°ãã䜿ã£ãŠãããã§ããã©ãè·å Žã§Polarsã䜿ã£ãŠããæ¹ãå€ããªã¹ã¹ã¡ãããããšããã£ããã«æµ®æ°ããŠã¿ãŠãããããªãããµã¯ãµã¯ããŠãŠæ°æã¡ããâŠãã£ãŠãªã£ãŠããŸããŸããïŒç¬ïŒ
Â
äŸ1ïŒèªã¿èŸŒã¿ã®éãããããïŒäœæ3åéïŒ
ããæ¥ã100MBãããã®CSVãã¡ã€ã«ãèªã¿èŸŒãã§å å·¥ããŠãããPandasã ãš3ã4ç§ãããããã£ãŠãŸããã
ã§ã詊ãã«Polarsã§èªã¿èŸŒãã§ã¿ããããªããš1ç§ä»¥äžïŒ
ãã£ãæ°ç§çšåºŠã§ããããããããã£ãšå€§èŠæš¡ããŒã¿ã ãšæ°åååäœã®æéã®ç¯çŽã«ãªãããšãã
# Pandas
import pandas as pd
df = pd.read_csv("bigdata.csv")
# Polars
import polars as pl
df = pl.read_csv("bigdata.csv")
Â
åºæ¬çãªèªæ³ã¯Pandasãšå€ãããªãã§ãã
ãŸããChatGPTã§ãæžãæãã¯ã»ãŒæ£ç¢ºã«ããŠããããããã®ç²ŸåºŠãªã®ã§ãæ¢åã®Pandasã³ãŒããPolarsã«å€æãããå Žåã容æã§ããã
Â
äŸ2ïŒgroupbyéèšãã¹ãããªæžãã
ããŒã¿åæã§ãã䜿ãã®ããã°ã«ãŒãããšã®å¹³åããšããåèšå€ãã¿ãããªåŠçã§ãããã
Pandasã§ãã§ãããã©ãã¡ãã£ãšæžãæ¹ãè€éã«æããããšããã£ãŠâŠ
# Pandas
df.groupby("category")["sales"].sum().reset_index()
# Polars
df.groupby("category").agg(pl.col("sales").sum())
Polarsã¯ãã® .agg() ããšã£ãŠããããããããŠãå人çã«ã¯å¥œãã§ãã
éèšãå¢ããŠãèªã¿ããããå€ãããªãæãã
# è€æ°éèšã®äŸïŒPolarsïŒ
df.groupby("category").agg([
pl.col("sales").mean(),
pl.col("sales").max()
])
Â
èŠãç®ãã¹ãããªããŠãŠãä»ãªã«ãã£ãŠãããç®ã«å ¥ããããã§ãã
Â
äŸ3ïŒã¡ã¢ãªã軜ãæãããã
ããã¯ãäœæãã§ãããªãã®ã§ãããPolarsã䜿ã£ãŠããšã¡ã¢ãªäœ¿çšéãæããããŠãããã«æããŸãã
å€ãããŒãããœã³ã³ã§äœæ¥ããŠããšãPandasã ãšã¡ãã£ãšéãããªã£ã¡ããå Žé¢ã§ããPolarsã¯è»œå¿«ã«åããŠãããŸããã
ç¹ã«ããããŒã¿ãã³ããŒããªããæåãããã©ã«ããªã®ã§ãäœåºŠã倿°ã«å
¥ãããããã³ãŒãã§ãã¡ã¢ãªæ¶è²»ãæããããããšãå€ãæ°ãããŸãã
ãã ãããã«ã¯æ³šæç¹ããã£ãŠãæå³ããå
ããŒã¿ãæžãæãã£ãŠããŸãããšãããã®ã§ãæ
£ãããŸã§ã¯æ
éã«äœ¿ãã®ãããããã§ãã
Â
Â
ãšã¯ããPandasã倧奜ãã§ã
ãããŸã§Polarså¯ãã®è©±ãããŠããŸããããPandasã«ã¯Pandasã®è¯ããããã®ãäºå®ã§ãã
-
å¯èŠåã©ã€ãã©ãªãšã®çžæ§ãããïŒSeabornãMatplotlibãšãïŒ
-
åŠç¿ææããµã³ãã«ã³ãŒããè±å¯ã§ãå°ã£ããšãã«æ€çŽ¢ãããã
-
Jupyter Notebookãšã®çµã¿åããããã¡ããã¡ã䜿ãããã
ããšãããŒã¿éãå°ãªãã»è»œãåŠçãªãPandasã§å
šç¶åé¡ãªãã§ãããæžããããã芪ãã¿ãããã§ã¯ãã£ã±ãPandasãå®çªã ãªãã£ãŠæããŸãã
Polarsã¯ã¡ãã£ãšæ
£ããå¿
èŠãªã®ã§ããäž¡æ¹äœ¿ãåãããã®ãä»ã®ç§ã®ã¹ã¿ã€ã«ã§ãã
Â
ãŸãšãïŒäœ¿ãåããéèŠããªïŒ
ä»åã¯ããŒã¿ãµã€ãšã³ãã£ã¹ãçãªèŠç¹ãããPandasãšPolarsã®äœ¿çšæã®éãã«ã€ããŠæžããŠã¿ãŸããã
ã©ã£ã¡ãæ£è§£ãšããããªããŠããã©ããããšãã«ãã©ã£ã¡ãæ°æã¡ãã䜿ããããã§éœåºŠéžã¶ã®ãããããªïœãšæã£ãŠããŸãã
ç§ã®å Žåã¯ã
-
軜ãååŠçãå¯èŠå â Pandas
-
倧éããŒã¿ã®ååŠçãéèš â Polars
ã£ãŠããæãã§éžãã§ãŸãã
Â
ååŠçã®ããããã®ãã¯ããã¯ã«é¢ããŠã¯ããã¡ãã®æ¬ããšãŠãåèã«ãªããŸããã
Polarsã®è§£èª¬æžããŸã ãŸã å°ãªãäžã§ããªãæçãªæ å ±ããã£ããšæããŸããç§ã®è·å Žã®ããŒã¿ãµã€ãšã³ãã£ã¹ãã®æ¹ãã¿ããªæã£ãŠãŸãã
Â
Â
ãããŠããããã§éãã«å¯ãŠããã¡ã®ç«ãããã¡ã¯ãããŒã«ãäœããªããŠæ°ã«ããããã ãã£ããã¿ã¯ãŒã®å¯å¿å°ã楜ããã§ããã®ã§ããã»ã»ã»ðž



