Azure – Sander Timmer

AzureSMR: handle your Azure subscription with R

By Sander Timmer | January 4, 2017 | Comments 0 Comment

Great new package for the people that use Microsoft Azure as their platform of choice and love R. With AzureSMR you are capable to handle the following services:

Azure Blob: List, Read and Write to Blob Services
Azure Resources: List, Create and Delete Azure Resource. Deploy ARM templates.
Azure VM: List, Start and Stop Azure VMs
Azure HDI: List and Scale Azure HDInsight Clusters
Azure Hive: Run Hive queries against a HDInsight Cluster
Azure Spark: List and create Spark jobs/Sessions against a HDInsight Cluster(Livy)

Install it from your interactive shell:

#Install devtools if(!require("devtools")) install.packages("devtools") devtools::install_github("Microsoft/AzureSMR") library(AzureSMR)

GitHub: https://github.com/Microsoft/AzureSMR

Source: http://blog.revolutionanalytics.com/2016/12/azuresmr.html

Using Azure Machine Learning Notebooks for Quality Control of Automated Predictive Pipelines

By Sander Timmer | January 19, 2016 | Comments 0 Comment

When building an automated predictive pipeline, to have periodically batch-wise score new data, there is a need to control for quality of the predictions. The Azure Data Factory (ADF) pipeline will help you ensure that your whole data set gets scored. However, this is not taking into consideration that data can change over time. For example, when predicting churn changes in your website or service offerings could change customer behavior in such a way that retraining of the original model is needed. In this blog post I show how you can use #Jupyter Notebooks in +Microsoft Azure Machine Learning (AML) to get a more systematic view on the (predictive) performance of your automated predictive pipelines.

#DataScience #MachineLearning #Azure #AzureDataFactory #Python #Notebook

Using Azure Machine Learning Notebooks for Quality Control of Automated Predictive Pipelines – Developing Analytics Solutions with the Data Insights Global Practice – Site Home – MSDN Blogs

By Sander Timmer, PhD, Data Scientist. When building an automated predictive pipeline to have periodically batch-wise score new data there is a need to control for quality of the predictions. The Azure Data Factory (ADF) pipeline will help you ensure that your whole data set gets scored.

Check this out on Google+ 1 1

Using Microsoft Azure Blob Storage from within Python

By Sander Timmer | August 17, 2015 | Comments 6 comments

When working with cloud-born applications it is sometimes nice to work with any local files. In my case I was working on building some Python pipeline to preprocess data before doing some Machine Learning with it. Actually, my Python code is living in a Jupyter notebook hosted by the Azure Machine Learning Studio.

As my data is living in Azure Blob Storage (this is the fast and cheap generic storage in the Microsoft cloud for your files) I wanted to write some Python scripts that would read from blob storage and write back to blob storage without having any local temp files. As the official documentation is not very clear (at least I find some parts confusing) I will share some bits of Python code that is working for me. Obviously this is all at your own risk and I cannot guarantee this solution will be stable nor that it will be the only or best way to do this.

#connect to your storage account
from azure.storage import BlobService
blob_service = BlobService(account_name='YourAccountName', account_key='YourKey')

#list all CSV files in your storage account

blobs = []
marker = None
while True:
    batch = blob_service.list_blobs('YourContainer', marker=marker, prefix='input_')
    blobs.extend(batch)
    if not batch.next_marker:
        break
    marker = batch.next_marker
for blob in blobs:
    print(blob.name)

#read the blob file as a text file
#I just read in the first from the pervious list

data = blob_service.get_blob_to_text('rockt', blobs[0].name).split("\n")
print("Number of lines in CSV " + str(len(data)))

#do your stuff
#I want to filter out some lines of my CSV and only keep those having ABC or DEF in them

matchers = ['abc', 'def']
matching = [s for s in data if any(xs in s for xs in matchers)]
print("Number of lines in CSV " + str(len(matching)))

#write your text directly back to blob storage

blob_service.put_block_blob_from_text(
    'YourContainer',
    'YourOutputFile.csv',
    ''.join(matching),
    x_ms_blob_content_type='text'
)