5 Python libraries to lighten your machine learning load

Machine learning's exciting, but the work is complex and difficult. It typically involves a lot of manual heavy lifting -- assembling workflows and pipelines, setting up data sources, and shunting back and forth between on-prem and cloud-deployed resources.

The more tools you have in your belt to make that job easier, the better. Thankfully, python is a giant tool belt of a language that's widely used in big data and machine learning. Here are five Python libraries that help make the heavy lifting for those trades a little less heavy.

PyWren

A simple package with a powerful premise, PyWren lets you run Python-based scientific computing workloads as multiple instances of AWS Lambda functions. A profile of the project at The New Stack describes how PyWren uses AWS Lambda as a giant parallel processing system, tackling projects that can be sliced and diced into little tasks that don't need a lot of memory or storage to run.

One downside is that lambda functions can't run for more than 300 seconds max. But if you need a job that only takes a few minutes to complete, and you need to run it thousands of times across a dataset, PyWren may be a good way to parallelize that work in the cloud at a scale unavailable on end user hardware.

Tfdeploy

Google's TensorFlow framework is taking off big-time now that it's at a full 1.0 release . One common question asked about it: How can I make use of the models I train in TensorFlow without using TensorFlow itself?

Tfdeploy is a partial answer to that question. It exports a trained TensorFlow model to "a simple NumPy-based callable," meaning the model can be used in Python with the only dependencies being Tfdeploy and the the NumPy math-and-stats library. Most of the operations you can perform in TensorFlow can also be performed in Tfdeploy, and you can extend the behaviors of the library by way of standard Python metaphors (e.g., overloading a class).

Now the bad news: Tfdeploy doesn't support GPU acceleration, if only because NumPy doesn't do that. Tfdeploy's creator suggests using the gNumPy project as a possible replacement.

Luigi

Writing batch jobs is generally only one part of processing heaps of data; you also have to string all those jobs together into something resembling a workflow, or a pipeline. Luigi, created by Spotify and named for the other plucky plumber made famous by Nintendo , was built to "address all the plumbing typically associated with long-running batch processes."

With Luigi, a developer can take several different unrelated data processing tasks -- "a Hive query, a Hadoop job in Java, a Spark job in Scala, dumping a table from a database" -- and create a workflow that runs them, end-to-end. The entire description of a job and all of its dependencies are created as Python modules, not as XML config files or some other data format, so it can be integrated into other Python-centric projects.

Kubelib

If you're adopting Kubernetes as an orchestration system for machine learning jobs, the last thing you want is for the mere act of using Kubernetes to create more problems than it solves. Kubelib provides a set of Pythonic interfaces to Kubernetes, originally as a way to aid with Jenkins scripting. But it can be used without Jenkins as well, and it can do everything that's exposed through the kubectl CLI or the Kubernetes API.

PyTorch

Let's not forget about this recent and high-profile addition to the Python world, an implementation of the Torch machine learning framework. This project doesn't just port Torch to Python, but adds many other conveniences, such as GPU acceleration and a library that allows multiprocessing to be done with shared memory (for partitioning jobs across multiple cores). Best of all, it can provide GPU-powered replacements for some of the unaccelerated functions in NumPy.

5 Python libraries to lighten your machine learning load

Trending Articles

[奇怪机翻组] 双梦相牵 / ふたりの夢もち [RJ01259078] [WebRip] [1080P HEVC-10Bit AAC 2.0]...

HONDA CITY VTI-S 菜單分享

#新闻拍一拍# 新的摩尔定律：黄氏定律

一如既往的痴情能否打动月瓶金蝎？ (豆瓣月亮水瓶小组)

求購按摩椅~'~

「粉红」不是霸凌辜莞允杠部落客：我爽在哪？

Intel 7-10代集成显卡驱动31.0.101.2137完整版

涉Gotbit加密货币市场操纵台男纽约被捕

臺灣法治會計學會2025年第三季研討會

不靠姊姊！張柏芝弟弟開計程車維生

关门一家亲：习远平、张澜澜、徐才厚

剑指offer——24.二叉树中和为某一值的路径

苏珊米勒日晕05.11｜狮子鼓励孩子；处女相信自己 (豆瓣 SUSAN MILLER小组)

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速

【日语无字】春之钟.Haru.no.kane.1985.JAP.vhsrip.NoSub.by.xiongzaixia&vivi

美籍老公不讓步李愛綺兒子念公立小學

爆杨兰兰对于朦胧一见倾心泄露亲爹习近平致命机密？【阿波罗网报道】

湖州师范学院音乐学院开发的 Kontakt 8 明代魏氏乐琵琶/瑟/月琴音源即将发布

LameXP 4.21.2382 免安裝中文版 - MP3音樂轉檔軟體

免费翻墙节点大全