Cainã Max   Couto da Silva

Cainã Max Couto da Silva

Postdoctoral Data Scientist

UW-Madison

About me

As a highly skilled data scientist with a PhD in bioinformatics and over ten years of working on relevant projects, I developed a strong data science and analytics foundation. I have spent the last few years working at world-renowned companies, developing end-to-end machine learning applications. Additionally, driven by my passion for knowledge, I’ve taught specialized courses in various data science topics.

Interests
  • Data Science
  • Artificial Intelligence
  • Machine Learning & Deep Learning
  • Generative AI (LLMs)
Education
  • MBA in Data Science & Analytics

    Universidade de São Paulo

  • PhD in Bioinformatics

    Universidade de São Paulo

  • MSc in Science, 2016

    Universidade de São Paulo

Current experience

 
 
 
 
 
UW-Madison - Top 15 public universities in the US
Data Scientist & Postdoctoral Researcher
August 2024 – Present Madison - WI, USA

Main activities:

  • Support database development, statistical analysis, and AI applications; write up research results; and travel to Brazil to represent UW in meetings with stakeholders.

Deliveries:

  • Automation of multiple processes in database development, reducing the time of manual intervention.
  • An ML classifier to assess data quality at scale, allowing for the first-time scalable quality check.
  • Enhance a deep Learning model to count cattle in protected areas using high-resolution imagery. In collaboration with Google, this project empowers Brazilian prosecutors to act against non-compliant farmers, thereby reducing Amazon’s deforestation.

Industry experience

 
 
 
 
 
Schlumberger - World's largest offshore drilling company
Data Scientist
Schlumberger - World’s largest offshore drilling company
January 2023 – July 2024 Houston - TX, USA (remote)

I work developing end-to-end AI SaaS products to our internal customers.

Main activities:

  • Building predictive models using either statistical or machine learning approaches to assess the health of the company assets (tools)
  • Assessing the technical feasibility of new projects through data analysis

Deliveries:

  • Statistical models to assess the asset health
  • Machine learning models to predict the asset health for the upcoming usage
  • A custom-trained OCR model to extract dimensions of engineering drawings

Quick facts:

  • As result of my first project, our work has been accepted to be published as a scientific paper at OnePetro.
  • I led a innovation proposal using AI, and we ranked in the top 12 out of almost 400 ideas worldwide. My colleague and I gave the final pitch to the CEOs.
  • I work in one of the most diverse teams in our company, interacting with people from US, India, Europe, and South America in my daily routine.

 

Tools: Dataiku, GCP, SQL, Python, Dash, Streamlit, machine learning libraries (e.g., scikit-learn), data visualization libraries.

 
 
 
 
 
DNC  - Edtech
Data Science Consultor (Education)
DNC - Edtech
October 2021 – July 2024 São Paulo - SP, Brazil (remote)

I have worked in multiple roles: facilitator, mentor, and consultant/instructor.

As a consultant/instructor, I prepared the course modules and recorded classes. I recorded four modules: statistics, data cleaning/wrangling, clustering, and model deployment. I have also devised data science activities about descriptive and inferential statistics, unsupervised machine learning models, MLFlow, big data with PySpark, among others.

As a mentor, I assessed student reports and addressed their questions through Q&A sessions, aiding in academic and real-world projects.

As a facilitator, I participated in data science activities from exploratory analysis to model deployment. I also prepared some of those activities.

 
 
 
 
 
Ambev Tech  - World's largest beer brewer company
Data Scientist
Ambev Tech - World’s largest beer brewer company
May 2022 – December 2022 São Paulo - SP, Brazil (remote)

Squad: Revenue Management

Activities / Deliverables:

  • Rule-based automation of pipelines for price engines using PySpark (big data)
  • Statistical model to identify price changepoints for several SKU categories
  • Monthly time series modeling from of selling volume using state-of-the-art forecast and hierarchical reconciliation methods
  • Training for data scientists

 

Tools: • Python • PySpark • MLFlow • Scikit-learn • Pycaret • Statsmodels • Forecasting frameworks (Prophet, NeuralProphet, StatsForecast)

 
 
 
 
 
Remessa Online  - Fintech
Data Scientist
Remessa Online - Fintech
October 2021 – April 2022 São Paulo - SP, Brazil (remote)

Activities:

  • Advanced statistical modeling: data exploratory analysis, hypothesis testing, time series forecasting, predictive analyses (e.g., regression and classification), customer and product segmentation, etc.

Deliverables:

  • Multiple time series forecasting for customer segments (weekly and monthly)
  • Model for predicting the probability of the customer recurrence
  • In-depth study of churn analysis through inferential statistics

 

Tools: • Python • PySpark • MLFlow • Scikit-learn • Pycaret • Catboost • Prophet • AWS • GitHub • Data visualization libraries (Plotly, Seaborn, Matplotlib)

 
 
 
 
 
Eli Lilly
Safety Data Sciences Associate
Eli Lilly
June 2021 – October 2021 Indianopolis - IN, USA (remote)

Activities:

  • Queries and reports for worldwide company members

Academic experience

 
 
 
 
 
Universidade de São Paulo
MBA in Data Science & Analytics
Universidade de São Paulo
May 2021 – August 2023 São Paulo - SP, Brazil

Grade: 10

  • In‑depth study of machine learning models.
  • Developed an end‑to‑end hybrid ML model for churn prediction.
  • Project code repository
 
 
 
 
 
Universidade de São Paulo
PhD Researcher
Universidade de São Paulo
July 2016 – April 2021 São Paulo - SP, Brazil

Thesis: Identifying natural selection in Native American populations. Supported by: CAPES (2016 - 2018) and FAPESP (2018 - 2020)

Activities:

  • Data analysis, visualization, and scientific reporting of genetic data using R, Python, and bash scripting.
  • Application of non‑supervised algorithms (e.g., PCA), descriptive and inferential statistics.

Deliverables:

  • Internal R package with customized functions to facilitate multiple analyses
  • Three scientific papers published in international magazines
  • Thesis code repository
     

I presented our preliminar work in international conferences, including USA. I also took a internship of 3 months in Barcelona - Spain.

 
 
 
 
 
Universidade de São Paulo
Master in Science
Universidade de São Paulo
April 2014 – March 2016 São Paulo - SP, Brazil

Dissertation: Role of cellular prion protein and its ligand, stip1, in the adult neurogenesis. Supported by: CNPq (2014 - 2016)

Main techniques: primary cell culture, immunofluorescence, and hypothesis testing.

 
 
 
 
 
Universidade de São Paulo
Bachelor in Biological Science
Universidade de São Paulo
February 2011 – December 2013 São Paulo - SP, Brazil

Monography: Role of the interaction between the cellular prion protein and its ligand, STI1, in the biology of neural precursors from the murine adult brain. Supported by: University scholarship (2011 - 2012) and FAPESP (2012 - 2013)

Honor & Awards:

  • Best academic performance
  • Professor Bertha Lange de Morretes’ Award

Publications

Contact

Feel free to contact me through this form: