0 to Life-Changing App: Viewing Images with Python 3 and OpenSlide

500 Images at around 7GB each. That's a lot of data. Not sure what I am referring to? Let me fill you in.

Not too long ago I was assigned the project of my dreams here at IBM's Spark Technology Center: create a super life-changing application that incorporates Apache Spark and Apache SystemML. If you've been following along, you'll know that I am barely a year into my data science career and between my internship here at the STC and my masters degree at UC Berkeley, I have met with a steep learning curve. Because of this, I've decided to blog about every step along the way! That way every data enthusiast and fellow data scientist can follow along and build their own life-changing app. (After all, we might as well crowd source saving the world.)

My lastblog post was a tutorial on how to use the new SystemML API on the Spark Shell, but before that, I looked at the frustrating step of finding big, open data. On this quest for delightful data, my team and I came across a breast cancer research competition that was an ideal use case for SystemML and Spark. I mean, it was BIG data, life-changing, and interesting. What's not to love? Let me elaborate. After entering the competition, we were given 500 digital images of breast cancer tissue on medical slides, taken from a microscope. Considering that these images are huge slides (apx 7GB each) with 20-40x zoom, with 50,000 pixels to 100,000 pixels in both directions, we can safely say we are dealing with really big data! Because of the size, it is an excellent challenge for Apache Spark and Apache SystemML and our goal will be to develop an automatic way, or a SystemML solution, to determine the grade of cancer in any given tissue image. In order to solve this problem, we will need to use deep learning and neural networks, but first , we have to clean up our data. That's what this blog is for!

While in this pre-processing stage, I've been able learn about a ravishing resource for viewing large images: Openslide and deepzoom. Because of this, I'll first walk you through how to set up and use these tools. After that, we will go ahead and get started on some pre-processing steps! If you don't have access to images of your own, try this source .

First update. brew update brew upgrade Install python 3. brew install python3 Install the Python packages. pip3 install -U matplotlib numpy pandas scipy jupyter scikit-learn scikit-image flask Install OpenSlide. brew install openslide pip3 install openslide-python Now, create a new folder and work from there. I named mine AwesomeProject/.

*Note: Check where you installed SystemML in my firsttutorial. *Note #2: If you don't have tissue images lying around, use this source . Download the .svs files.

#Download a few images to get started. #Place them in your a new folder within AwesomeProject/. #I called mine data/. #Start your Jupyter notebook. PYSPARK_PYTHON=python3 PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*] --driver-class-path $SYSTEMML_HOME/SystemML.jar #Leave this tab running and Jupyter open in your browser. We #will come back to it later.

Make sure your Jupyter notebook starts up with Python3 in the right hand corner. If it doesn't show up automatically, go to Kernel -> Change Kernel -> Python3. If that doesn't work you may need to make sure Python3 is the version being used.

Now, in a new tab on terminal, go into your data/ folder. You'll now need to clone OpenSlide and go into the folder to start it. Don't know git? Here 's a great tutorial. git clone https://github.com/openslide/openslide-python.git cd openslide-python/examples/deepzoom python3 deepzoom_multiserver.py../../../data/ Now you need to open OpenSlide on your browser. #After you push enter, your terminal should say: *Running on http://address #copy that http: address and paste it in your browser. Now you should have two tabs on terminal occupied by Jupyter and OpenSlide. Leave both of them running. When you go to OpenSlide on your browser you should see a list of your image files in your data/ file.
0 to Life-Changing App: Viewing Images with Python 3 and OpenSlide

Click on one of the images to see it.
0 to Life-Changing App: Viewing Images with Python 3 and OpenSlide

One you are viewing the image you can use your mouse or track pad to zoom in and out.
0 to Life-Changing App: Viewing Images with Python 3 and OpenSlide

Congrats! You've now looked at all of that tissue using OpenSlide and Python 3. Now, let's do our first pre-processing step using Jupyter.

Navigate back to your Jupyter Notebook that should be in your browser. Remember, we are still in a bit of an exploratory phase, so our aim is to look at example tiles and change it around before applying it to the entire slide and most definitely before applying it to all 500 slides.

Our first step is to load everything we need. %load_ext autoreload %autoreload 2 %matplotlib inline # Add SystemML PySpark API file. sc.addPyFile("https://raw.githubusercontent.com/apache/incubator-systemml/branch-0.10/src/main/java/org/apache/sysml/api/python/SystemML.py") from glob import glob import matplotlib.pyplot as plt import matplotlib as mpl import numpy as np import openslide from openslide import open_slide from openslide.deepzoom import DeepZoomGenerator import pandas as pd from scipy.ndimage.morphology import binary_fill_holes, binary_closing, binary_dilation from skimage.color import rgb2gray from skimage.morphology import closing, binary_closing, disk, remove_small_holes, dilation, remove_small_objects from skimage import color, morphology, filters, exposure, feature plt.rcParams['figure.figsize'] = (10, 6) Now we can choose the slide we want to work with. #Start by getting your images from your data/ file. files = glob("data/*.svs") files #Specify which image/slide it is. For this example I will #use slide 7. slide_num = 7 slide = open_slide(files[slide_num-1]) Now we will generate tiles or, in other words, slice the image up into smaller squares. This will help us look at the image in more detail and will also help us process the content later. We want to do this because we can't process the entire image, but need to instead process them by tile. tile_size = 1024 tiles = DeepZoomGenerator(slide, tile_size=tile_size, overlap=0, limit_bounds=False) # overlap adds pixels to each side # See how many tiles there are for each level of magnification. tiles.level_tiles #choose tiles you want to look at. You can change around #the coordinates to get the tile you are looking for. #This is where OpenSlide helps. tile = tiles.get_tile(tiles.level_count-1, (85, 35)) tile Below are examples of what I did.
0 to Life-Changing App: Viewing Images with Python 3 and OpenSlide

Look at you! You have generated your tiles and visualized some examples! You are now officially an expert at OpenSlide after looking at images of tissue, loading your images, and visualizing some example tiles. Next up will be further pre-processing steps and exploration. Once we have finished our pre-processing on example tiles, we will be able to apply it to all of our slides and use our Spark cluster. This will be followed by our fancy SystemML steps. It seems we are well on our way to changing lives.

Stay tuned for more!

By Madison J. Myers

0 to Life-Changing App: Viewing Images with Python 3 and OpenSlide

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本