Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

Analyzing social networks using Python and SAS Viya

$
0
0

Analyzing social networks using Python and SAS Viya
The study of social networks has gained importance over theyears within social and behavioral research on HIV and AIDS. Social network research can show routes of potential viral transfer, and be used to understandthe influence of peer norms and practices on the risk behaviors of individuals.

This example analyzes the results of a study of high-risk drug use for HIV prevention in Hartford, Connecticut , using python and SAS. This social network has 194 nodes and 273 edges, which represent drug usersand the connections between those users.

Background

SAS support for network analysis has been around for a while. In fact, I have shownrelated techniques usingSAS Visual Analyticsin myprevious post. If you are new to social network analysis you may want toreview theblog first as it provides a great introduction into the world of networks.

This post is written for the application developeror data scientist who has programming experience and seeks self-service access to comprehensive analytics. I will highlight how to gain access to SAS Viya TM using REST API in Python as well as demonstrate how to drive a simple analytical pipeline to analyse a social network.

The recent release ofSASViyaprovides a full set of innovative algorithms and proven analytical methods for exploring experimental questions but it's also built based on an open architecture. This means youcan integrate SAS Viya seamlessly into yourapplication infrastructure as well as drive analytical models using any programming language. This blog post highlights one example of how this openness can be used to access powerful SAS analytics.

Prerequisites

While you could go ahead and simply issue a series of REST API calls to access the data it's typically more efficient to use a programming language to structure your work and make it repeatable. I decided to use Python, as it's very popular among young data scientists and very common in universities.

For demonstration purposes, I'm using an interface called Jupyter ,an open and interactive web-based platform capable of running Python code as well as embed markup text. The SAS community also hosts many additional examples for accessing SASdata with Jupyter. In fact, Jupyter supports many different programming languages, including SAS. You may also be interested in trying out the related SAS kernel .

After installing Jupyter you will also need to install the SAS Scripting Wrapper for Analytics Transfer (SWAT) . This package is the Python client to SAS Cloud Analytic Services (CAS). It allows users to execute CAS actions and process the results all from Python. SWAT package information and Jupyter Notebook examples for getting started are also available from https://github.com/sassoftware .

Accessing SAS Cloud Analytic Services (CAS)

The core of SAS Viya is the analytical run-time environmentcalled SAS Cloud Analytic Services (CAS). In order for you to execute actions or access data, a connection session is required. You can either use a binary connection (which is recommended for transferring large amount of data) or use REST API via HTTP or HTTPS communication. Since I'm analyzing a very small network for demonstration purposes I will use the REST protocol. More information about Viya and CAS can be found in the relatedonline documentation.

One of the first steps in any program is to define the libraries you are going to use. In Python, this is done using the import statement. Besides the very common matplotlib library, I'm also going to use networkx to render and visualize the network graphs in Python.

from swat import * import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib.colors as colors # package includes utilities for color ranges import matplotlib.cm as cmx import networkx as nx # to render the network graph %matplotlib inline

Now the SWAT libraries have been loaded we can issue the first command to connect to CAS and create a session for the given user. Note, that parameters used will vary dependent on your environment. The variable "s"will hold the session object and will be referenced in future calls.

s = CAS('http://sasviya.mycompany.com:8777', 8777, 'myuser', 'mypass')

Action sets

The CAS server organizes analytical actions into action sets. An action set can hold many different actions from simple data or sessionmanagement tasks to sophisticated analytical tasks. For this network analysis I'm going to use an action set named hyperGroup that has only one action, also called hyperGroup.

s.loadactionset('hyperGroup')

Loading data

In order to perform any analytical modelling, we need data. We have several options to load dataincluding using an existing data set on the server or uploading a new set from the local environment. The SAS community web sites showsadditional examples how data can be loaded. The following examples uploads a local CSV file to the server and stores data into a table named DRUG_NETWORK . The table has only two columns FROM and TO of type numeric.

out = s.upload("data/drug_network.csv", casout=dict(name='DRUG_NETWORK', promote = True))

During analytical modelling you often have to change data structures, filter or merge data sources. The following code lines show an example of how to execute SAS Data Step code and derive new columns. The put function here converts both numeric columns to new character columns SOURCE and TARGET .

sasCode =

Viewing all articles
Browse latest Browse all 9596

Trending Articles