Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

Thomas Wiecki on Probabilistic Programming with PyMC3

$
0
0

Thomas Wiecki on Probabilistic Programming with PyMC3

A rolling regression with PyMC3 : instead of the regression coefficients being constant over time (the points are daily stock prices of 2 stocks), this model assumes they follow a random-walk and can thus slowly adapt them over time to fit the data best.

Probabilistic programming is coming of age. While normal programming languages denote procedures, probabilistic programming languages denote models and perform inference on these models. Users write code to specify a model for their data, and the languages run sampling algorithms across probability distributions to output answers with confidence rates and levels of uncertainty across a full distribution. These languages, in turn, open up a whole range of analytical possibilities that have historically been too hard to implement in commercial products.

One sector where probabilistic programming will likely have significant impact is financial services. Be it when predicting future market behavior or loan defaults, when analyzing individual credit patterns or anomalies that might indicate fraud, financial services organizations live and breathe risk. In that world, a tool that makes it easy and fast to predict future scenarios while quantifying uncertainty could have tremendous impact. That’s why Thomas Wiecki , Director of Data Science for the crowdsourced investment management firmQuantopian, is so excited about probabilistic programming and the new release of PyMC3 3.0 .

We interviewed Dr. Wiecki to get his thoughts on why probabilistic programming is taking off now and why he thinks it’s important. Check outhis blog, and keep reading for highlights!

A key benefit of probabilistic programming is that it makes it easier to construct and fit Bayesian inference models . You have a history working with Bayesian methods in your doctoral work on cognition and psychiatry. How did you use them?

One of the main problems in psychiatry today is that disorders like depression or schizophrenia are diagnosed based purely on subjective reporting of symptoms, not biological traits you can measure. By way of comparison, imagine if a cardiologist were to prescribe heart medication based on answers you gave in a questionnaire! Even the categories used to diagnose depression aren’t that valid, as two patients may have completely different symptoms, caused by different underlying biological mechanisms, but both fall under the broad category “depressed.” My thesis tried to change that by identifying differences in cognitive function rather than reported symptoms to diagnose psychiatric diseases. Towards that goal, we used computational models of the brain, estimated in a Bayesian framework, to try to measure cognitive function. Once we had accurate measures of cognitive function, we used machine learning to train classifiers to predict whether individuals were suffering from certain psychiatric or neurological disorders. The ultimate goal was to replace disease categories based on subjective descriptions of symptoms with objectively measurable cognitive function. This new field of research is generally known as computational psychiatry, and is starting to take root in industries like pharmaceuticals to test the efficacy of new drugs.

What exactly was Bayesian about your approach?

We mainly used it to get accurate fits of our models to behavior. Bayesian methods are especially powerful when there is hierarchical structure in data. In computational psychiatry, individual subjects either belong to a healthy group or a group with psychiatric disease. In terms of cognitive function, individuals are likely to share similarities with other members of their group. Including these groupings into a hierarchical model gave more powerful and informed estimates about individual subjects so we could make better and more confident predictions with less data.


Thomas Wiecki on Probabilistic Programming with PyMC3

Bayesian inference provides robust means to test hypotheses by estimating how different two different groups are from one another.

How did you go from computational psychiatry to data science at Quantopian?

I started working part-time at Quantopian during my PhD and just loved the process of building an actual product and solving really difficult applied problems. After I finished my PhD, it was an easy decision to come on full-time and lead the data science efforts there. Quantopian is a community of over 100.000 scientists, developers, students, and finance professionals interested in algorithmic trading. We provide all the tools and data necessary to build state-of-the-art trading algorithms. As a company, we try to identify the most promising algorithms and work with the authors to license them for our upcoming fund, which will launch later this year. The authors retain the IP of their strategy and get a share of the net profits.

What’s one challenging data science problem you face at Quantopian?

Identifying the best strategies is a really interesting data science problem because people often overfit their strategies to historical data. A lot of strategies thus often look great historically but falter when actually used to trade with real money. As such, we let strategies bake in the oven a bit and accumulate out-of-sample data that the author of the strategy did not have access to, simply because it hadn’t happened yet when the strategy was conceived. We want to wait long enough to gain confidence, but not so long that strategies lose their edge. Probabilistic programming allows us to track uncertainty over time, informing us when we’ve waited long enough to have confidence that the strategy is actually viable and what level of risk we take on when investing in it.

It’s tricky to understand probabilistic programming when you first encounter it. How would you define it?

Probabilistic programming allows you to flexibly construct and fit Bayesian models in computer code. These

Viewing all articles
Browse latest Browse all 9596

Trending Articles