Cainã Max   Couto da Silva

Cainã Max Couto da Silva

Data Science Researcher

UW-Madison

About me

As a highly skilled data scientist with a PhD in bioinformatics and over ten years of working on relevant projects, I developed a strong data science and analytics foundation. I have spent the last few years working at world-renowned companies, developing end-to-end machine learning applications. Additionally, driven by my passion for knowledge, I’ve taught specialized courses in various data science topics.

Interests
  • Data Science
  • Machine Learning
  • Deep Learning
  • Artificial Intelligence
Education
  • MBA in Data Science & Analytics, 2023

    Universidade de São Paulo

  • PhD in Science, 2021

    Universidade de São Paulo

  • MSc in Science, 2016

    Universidade de São Paulo

Skills

I develop end-to-end AI solutions
Programming
Python

Pandas, Numpy, Scikit-learn, PyCaret, Matplotlib, Seaborn, Plotly, Folium, BeautifulSoup, Selenium, etc.

SQL

Create, modify and retrieve data from relational database manage systems (e.g. MySQL, Postgres)

R

R base, data.table, tidyverse (e.g. dplyr, ggplot2), plotly, Rmarkdown, Bioconductor packages, and more…

Bash scripting

All essential commands in bash, plus some knowledge with awk and sed.

Technologies
Version control

Familiarity with git and GitHub (using git for all projects)

Cloud Platforms

Worked with AWS and GCP (e.g., data storage, BigQuery, and Vertex AI.)

APIs

API requests and API development using either Flask or FastAPI

Docker

Create images for model and applications

Spark

Process big data with SparkSQL and build ML models with SparkMLlib

Capabilities
Data preprocessing

Good problem-solver for data wrangling challenges. Familiar with data cleaning and feature-engineering for ML tasks.

Data visualization

Advanced experience building data visualizations using Python and R (specially static figures, but I’m also familiar with interactive approaches).

Statistical Analysis

Descriptive and inferential statistics, incorporating theory into practical applications and in the AI projects I worked on.

Capabilities
Machine Learning

Build ML models for regression, classification, and clustering problems, as well as recommender systems, and time series models. I’m also familiar with autoML.

Deep Learning

Currently studying and applying tensorflow (keras) and pytorch

Model deployment

I’ve deployed models using Databricks, Dataiku, MLflow, Docker, FastAPI/Flask, alongside the version control and best coding practices.

Current experience

 
 
 
 
 
UW-Madison - Top 15 public universities in the US
Data Scientist & Postdoctoral Researcher
August 2024 – Present Madison - WI, USA

Main activities:

  • Support database development, statistical analysis, and AI applications; write up research results; and travel to Brazil to represent UW in meetings with stakeholders.

Deliveries:

  • Automation of multiple processes in database development, reducing the time of manual intervention.
  • An ML classifier to assess data quality at scale, allowing for the first-time scalable quality check.
  • Enhance a deep Learning model to count cattle in protected areas using high-resolution imagery. In collaboration with Google, this project empowers Brazilian prosecutors to act against non-compliant farmers, thereby reducing Amazon’s deforestation.

Industry experience

 
 
 
 
 
Schlumberger - World's largest offshore drilling company
Data Scientist
Schlumberger - World’s largest offshore drilling company
January 2023 – July 2024 Houston - TX, USA (remote)

I work developing end-to-end AI SaaS products to our internal customers.

Main activities:

  • Building predictive models using either statistical or machine learning approaches to assess the health of the company assets (tools)
  • Assessing the technical feasibility of new projects through data analysis

Deliveries:

  • Statistical models to assess the asset health
  • Machine learning models to predict the asset health for the upcoming usage
  • A custom-trained OCR model to extract dimensions of engineering drawings

Quick facts:

  • As result of my first project, our work has been accepted to be published as a scientific paper at OnePetro.
  • I led a innovation proposal using AI, and we ranked in the top 12 out of almost 400 ideas worldwide. My colleague and I gave the final pitch to the CEOs.
  • I work in one of the most diverse teams in our company, interacting with people from US, India, Europe, and South America in my daily routine.

 

Tools: Dataiku, GCP, SQL, Python, Dash, Streamlit, machine learning libraries (e.g., scikit-learn), data visualization libraries.

 
 
 
 
 
DNC  - Edtech
Data Science Consultor (Education)
DNC - Edtech
October 2021 – July 2024 São Paulo - SP, Brazil (remote)

I have worked in multiple roles: facilitator, mentor, and consultant/instructor.

As a consultant/instructor, I prepared the course modules and recorded classes. I recorded four modules: statistics, data cleaning/wrangling, clustering, and model deployment. I have also devised data science activities about descriptive and inferential statistics, unsupervised machine learning models, MLFlow, big data with PySpark, among others.

As a mentor, I assessed student reports and addressed their questions through Q&A sessions, aiding in academic and real-world projects.

As a facilitator, I participated in data science activities from exploratory analysis to model deployment. I also prepared some of those activities.

 
 
 
 
 
Ambev Tech  - World's largest beer brewer company
Data Scientist
Ambev Tech - World’s largest beer brewer company
May 2022 – December 2022 São Paulo - SP, Brazil (remote)

Squad: Revenue Management

Activities / Deliverables:

  • Rule-based automation of pipelines for price engines using PySpark (big data)
  • Statistical model to identify price changepoints for several SKU categories
  • Monthly time series modeling from of selling volume using state-of-the-art forecast and hierarchical reconciliation methods
  • Training for data scientists

 

Tools: • Python • PySpark • MLFlow • Scikit-learn • Pycaret • Statsmodels • Forecasting frameworks (Prophet, NeuralProphet, StatsForecast)

 
 
 
 
 
Remessa Online  - Fintech
Data Scientist
Remessa Online - Fintech
October 2021 – April 2022 São Paulo - SP, Brazil (remote)

Activities:

  • Advanced statistical modeling: data exploratory analysis, hypothesis testing, time series forecasting, predictive analyses (e.g., regression and classification), customer and product segmentation, etc.

Deliverables:

  • Multiple time series forecasting for customer segments (weekly and monthly)
  • Model for predicting the probability of the customer recurrence
  • In-depth study of churn analysis through inferential statistics

 

Tools: • Python • PySpark • MLFlow • Scikit-learn • Pycaret • Catboost • Prophet • AWS • GitHub • Data visualization libraries (Plotly, Seaborn, Matplotlib)

 
 
 
 
 
Eli Lilly
Safety Data Sciences Associate
Eli Lilly
June 2021 – October 2021 Indianopolis - IN, USA (remote)

Activities:

  • Queries and reports for worldwide company members

Academic experience

 
 
 
 
 
Universidade de São Paulo
MBA in Data Science & Analytics
Universidade de São Paulo
May 2021 – August 2023 São Paulo - SP, Brazil

Grade: 10

  • In‑depth study of machine learning models.
  • Developed an end‑to‑end hybrid ML model for churn prediction.
  • Project code repository
 
 
 
 
 
Universidade de São Paulo
PhD Researcher
Universidade de São Paulo
July 2016 – April 2021 São Paulo - SP, Brazil

Thesis: Identifying natural selection in Native American populations. Supported by: CAPES (2016 - 2018) and FAPESP (2018 - 2020)

Activities:

  • Data analysis, visualization, and scientific reporting of genetic data using R, Python, and bash scripting.
  • Application of non‑supervised algorithms (e.g., PCA), descriptive and inferential statistics.

Deliverables:

  • Internal R package with customized functions to facilitate multiple analyses
  • Three scientific papers published in international magazines
  • Thesis code repository
     

I presented our preliminar work in international conferences, including USA. I also took a internship of 3 months in Barcelona - Spain.

 
 
 
 
 
Universidade de São Paulo
Master in Science
Universidade de São Paulo
April 2014 – March 2016 São Paulo - SP, Brazil

Dissertation: Role of cellular prion protein and its ligand, stip1, in the adult neurogenesis. Supported by: CNPq (2014 - 2016)

Main techniques: primary cell culture, immunofluorescence, and hypothesis testing.

 
 
 
 
 
Universidade de São Paulo
Bachelor in Biological Science
Universidade de São Paulo
February 2011 – December 2013 São Paulo - SP, Brazil

Monography: Role of the interaction between the cellular prion protein and its ligand, STI1, in the biology of neural precursors from the murine adult brain. Supported by: University scholarship (2011 - 2012) and FAPESP (2012 - 2013)

Honor & Awards:

  • Best academic performance
  • Professor Bertha Lange de Morretes’ Award

Publications

Contact

Feel free to contact me through this form: