Hello, I'm

Shivani Soman.

Data Analytics Developer & Data Scientist.

Resume

About Me

Hi! I'm Shivani Soman, a Data Scientist based in New York City, NY.


Working in the finance field, I translate seemingly uninformative data into valuable and actionable insights. I build concise and insightful reporting dashboards and use machine learning solutions to improve clients' decision making power. My goal is to build innovative data-driven solutions that would help optimize everyday processes.


In my free time I enjoy impressing people with my ukulele skills, playing video games and trying to get my rubik's cube solve time under 30 secs.

Experience

Barclays Investment Bank

Data Analytics Developer / Data Scientist

  • Currently working in the Sales and CRM Analytics division in the Investment Banking sector at Barclays.
  • Analyzed huge amounts of unstructured financial data from multiple sources and created concise, insightful reporting dashboards and visualizations to enable high level Sales and Client Strategy Managers make data-driven decisions.
  • Created a Hierarchical DBSCAN based machine learning model for clustering similar clients for peer analysis using multiple features including Revenue, Wallet Size, Risk Weighted Assets, Balances and Resource Consumption.
  • Implemented topic modeling of millions of user inputted comments using Natural Language Processing and Latent Dirichlet Allocation (LDA) to group comments under various categories for efficient organization and analysis.
  • Improved external to internal contact matching for a Unified CRM project by +300% compared to previous techniques using Levenshtein Distance based fuzzy matching including nickname and phonetic matching.
  • Pioneered in creating extensions in QlikSense using Flask, JavaScript and Python to enable fetch/writeback to SQL Server.

Barclays Investment Bank

Summer Developer Analyst

  • Worked in the Risk Analytics, Finance and Treasury sector as part my summer internship.
  • Developed a part of the Client Trade Clearing Hub system performing analysis and automation of report generation to greatly reduce user overhead on client trades using Java, SQL and Bash scripting.

University of California, Los Angeles

Graduate Teaching Assistant

  • Taught Statistics for Life Sciences (LS40)- Topics included null hypothesis testing, p-value, paired test, F-value, chi-squared test, linear regression, data analytics and visualization using Python programming.
  • Conducted 2-hour lab sessions twice a week and was responsible for bi-weekly assignments, midterms and finals grading.

Education

University of California, Los Angeles

Sept 2017 - Dec 2018

Master of Science in Computer Science

Relevant Coursework : Big Data Analytics, Database Management and Statistical Computing, Bioinformatics, Large Scale Data Mining, Learning and Reasoning with Bayesian Networks, Health Analytics. GPA : 3.93

Maharastra Institute of Technology, Pune

June 2013 - May 2017

Bachelor of Engineering in Computer Engineering

Relevant Coursework : Data Mining Techniques and Applications, Business Intelligence, Machine Learning, Data Structures and Algorithms, Database Systems , Operating Systems, Computer Networks. Final Percentage : 71% (First Class with Distinction)

Projects

Music Genre Classification using Spectrograms and MFCC Features

Developed a two-fold method to classify music into 10 different genres using Convolutional Recurrent Neural Networks (CRNN) for spectrogram analysis and traditional machine learning classifiers for analysis of Mel-Frequency Spectral Coefficients (MFCCs) derived from the audio samples with an accuracy of 86%.

View Project

Location Prediction (Big Data Metagenomic Classification)

Performed predictive analysis on huge amounts of metagenomic data (> 3 TB) to correctly predict the origin of each metagenomic sample using neural networks and XGBoost.

Activity Monitoring using LSTM (Health Analytics using SmartWatch)

Built a deep learning algorithm using Long Short-Term Memory Networks (LSTMs) by collecting data from various sensors like accelerometer and gyroscope obtained from a smart watch to accurately predict the activity the user was performing like standing, sitting, walking, etc.

View Project

Twitter Popularity Prediction – Super Bowl 2015 Team Sentiment Analysis

Analyzed millions of tweets before and during Super Bowl XLIX (2015) to determine how and why public sentiments towards the New England Patriots and the Seattle Seahawks changed over the span of the championship match.

View Project

Personalized Medicine - Redefining Cancer Treatment (Kaggle)

Developed a Machine Learning model using NLP that classifies genetic mutations of cancer genes from an expert annotated knowledge base and text-based clinical literature into a set of predefined classes. Obtained a better score than the 1st rank on the leaderboard of this Kaggle competition using Word2Vec Embeddings and LightGBM.

View Project

Skills

Get in Touch