Basic Machine Learning in Python with Scikit-learn

Machine learning has become a hot topic in the last few years and it is for a reason. It provides data analysts with efficientways of extracting information from data, allowing it to be usedfor analysis and modeling purposes.

The Scikit-learn python library has implementations ofdozens of learning algorithms and is freelyavailable for academic and commercial use under the terms of the BSD licence. Some of these algorithms can be extremely useful for our job as water systems analysts, so given the overwelming amount of algorithmsimplemented in Scikit-learn, I though I would mention a few I find particularly useful for my research. For each method belowI included link swith an examples from the Scikit-learn’s website. Instalation and use instructions can be found in their website .

CART Trees

CARTtrees that can used forregression or classification. Any any tree,CART trees are considered poor (generally high variance) classifiers unless bootstrapped or boosted (see supervised learning), but the resulting rules are easily interpretable.

CART Trees: http://scikit-learn.org/stable/modules/tree.html#tree-algorithms-id3-c4-5-c5-0-and-cart

Dimensionality reduction

Principal component Analysis (PCA) is perhaps the most widely useddimensionality reduction technique.It works byfinding the basis the maximizes the data’s variance, allowing for the elimination of axis that have low variances.Among its uses arenoise reduction, data visualization, as it preserves the distances between data points, and improvement ofcomputational efficiency of other algorithms bygetting rid of redundant information.PCA can me used in its pure form or it can be kernelized to handledata setswhose variance is maximumin a non-linear direction.Manifold learning is another way of performing dimensionality reduction by unwinding the lower dimensional manifold where the information lies.

PCA: http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_3d.html#sphx-glr-auto-examples-decomposition-plot-pca-3d-py

Kernel PCA: http://scikit-learn.org/stable/auto_examples/decomposition/plot_kernel_pca.html#sphx-glr-auto-examples-decomposition-plot-kernel-pca-py

Manifold learning: http://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html#sphx-glr-auto-examples-manifold-plot-compare-methods-py

Clustering

Clustering is used to group similar points in a data set. One example is the problem of find customer niches based on the products each customer buys. The most famous clustering algorithm is k-means, which,as any other machine learning algorithm, works well on some data sets but not in others.There are several alternative algorithms, all of which exemplified in the following two links:

Clustering algorithmscomparison: http://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html#sphx-glr-auto-examples-cluster-plot-cluster-comparison-py

Gaussian Mixture Models(finds the sameresultsas k-means but also provides variances): http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_covariances.html#sphx-glr-auto-examples-mixture-plot-gmm-covariances-py

Reducing the dimentionality of a dataset with PCA or kernel PCA may speed up clustering algorithms.

Supervised learning

Supervised learning algorithms can be used forregression or classification problems (e.g. classify a point as pass/fail) based on labeled data sets. The most “trendy” one nowadays is neural networks, but support vector machines, boosted andbagged trees, and others are also options that should be considered and tested on your data set. Bellow are links to some of the supervised learning algorithms implemented in Scikit-learn:

Comparison between supervised learning algorithms : http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py

Neural networks: http://scikit-learn.org/stable/modules/neural_networks_supervised.html

Gaussian Processes is also a supervised learning algorithm (regression) which isalso be used for Bayesian optimization:

Gaussian processes: http://scikit-learn.org/stable/auto_examples/gaussian_process/plot_gpr_noisy_targets.html#sphx-glr-auto-examples-gaussian-process-plot-gpr-noisy-targets-py

Basic Machine Learning in Python with Scikit-learn

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本