01Solution 1

Graph Applications

Cloud Security

Tools & Specialties

  • Solution Architecture
  • Data Visualization
  • Machine Learning
  • Graph Database
  • AWS
  • Cloud Security
  • Python

Cloud Security Graph Solution

While working at IHS Markit, I had the opportunity to develop a graph based cloud security tool that was a SaaS app hosted on AWS. This tool hosted all the cloud security telemetry in the cloud and was able to represent the technical relationships involved. Being responsible for the data structure, data pipeline, data schema, and data continuity, I developed the graph behind this solution from the ground using AWS Neptune. I also engineered the communication between Neptune and SageMaker to be able to perform Machine Learning solutions on the intricately developed graph database. I selected, trained, modeled and deployed machine learning solutions that provided anomaly detection or measured the risk of compromise that any compute instance has in the company's cloud network. These solutions had immediate business impact as stakeholders were able to prioritize their work based on these results. Each solution involved in this development maintained these main objectives: automation, optimization, budget reduction, and meaningful business impact. You can see documentation here.

02Solution 2

Natural Language Processing Applications

Human Rights Issues

Tools & Specialties

  • Solution Architecture
  • Artificial Intelligence
  • Human-Computer Interaction
  • Predictive Analytics
  • Data Visualization
  • Palantir Foundry
  • Python

NLP Applications

My work with Human Rights Issues has opened up a world of text problems, many of which can be explored using Natural Language Processsing. NLP is concerned with giving computers the ability to understand text and spoken words in much the same way human beings can. Text data can be looked at as a pool of words and phrases. NLP attempts to structure this unstructured data. I have detail multiple techniques I have explored along with the enablement solutions I architected within Palantir Foundry.

Please visit the documentation provided here for more details.

03Solution 3

Regression Applications

Environmental Issues

Tools & Specialties

  • Machine Learning
  • Predictive Analytics
  • Data Visualization
  • Published
  • Python

Firefighting Paper

Delivered on an end-to-end machine learning research solution. The topic was Predicting Number of Personnel to Deploy for Wildfire Containment. This paper was submitted and accepted by the International Conference on Data Science (ICData) for publication in a Research Book, Transactions on Computational Science & Computational Intelligence on May 23, 2021. I have provided the manuscript as well as the abstract below.

Abstract: Wildfire size, frequency, severity, and associated fatalities have surged at an alarming rate over the past 25 years, resulting in steep budget increases. According to USDA, the annual budget of U.S. Forest Service devoted to wildfires had more than tripled, jumping from 16% to 52% between 1995 and 2015 and it exceeded $2 billion in 2017. Under the current budget and capacity constraints, allocating correct amount of personnel and equipment to a fire timely is vital in suppresing fire, reducing costs, and saving lives. In this paper, we use gradient boosting decision trees to predict the number of personnel needed to effectively fight a wildfire. By combining the US wildland fire incident data and weather data, our model obtained a coefficient of determination (R2) 77.78%, a significant improvement over historical solution. Our model can potentially be used to provide decision support for those units who fight wildfires.

04Solution 4

Other Applications

M.S. Data Science - Libscomb University 2020

Tools & Specialties

  • Machine Learning
  • Predictive Analytics
  • Data Visualization
  • Python
  • Clusterization
  • R

An Exploration with R

In an exploration of R, I determined ho

w well spectral clustering compares to other well-known methods of clustering. The comparative algorithms chosen were K-Means, K-Medoids, and Spectral Clustering to determine whether or not a randomized dataset could be successfully clustered. After the code in R was run on our chosen dataset of Wine, the accuracy function and p-value showed quantifiably that the well-known method of K-Medoids and Spectral Clustering did a better job of correctly clustering the data than K-means. The most important thing to understand here is that clustering work should never be done with just one algorithm all the time. Any data scientist must try a few methods to make sure they come up with the best clusters for the data they have. You can explore this analysis further here.

Other M.S.D.S. Projects

Please reference my github account here to see my experience with Feature Engineering, Dimensionality Reduction, Statistical Methods, Data Visualization, and Model Interpretability.