Cainã Max   Couto da Silva

Cainã Max Couto da Silva

Postdoctoral Data Scientist

UW-Madison

About me

As a highly skilled data scientist with a PhD in bioinformatics and over ten years of working on relevant projects, I developed a strong data science and analytics foundation. I have spent the last few years working at world-renowned companies, developing end-to-end machine learning applications. Additionally, driven by my passion for knowledge, I’ve taught specialized courses in various data science topics.

Interests
  • Data Science
  • Artificial Intelligence
  • Generative AI (LLMs)
  • Machine Learning Engineering
Education
  • MBA in Data Science & Analytics

    Universidade de São Paulo

  • PhD in Bioinformatics

    Universidade de São Paulo

Current experience

 
 
 
 
 
UW-Madison - Top 15 public universities in the US
Postdoctoral Data Scientist
August 2024 – Present Madison - WI, USA
  • Currently developing a RAG-based text-to-SQL AI agent that allows users to interact with the database using Slack.
  • Developed SQLDeps: an open-source Python package leveraging LLMs to automatically extract table and column dependencies and outputs from complex SQL scripts 100X faster and >300X cheaper than human expert labor.
  • Developed and optimized cattle mapping using cutting-edge deep learning models on high‑resolution satellite imagery, providing actionable intelligence to combat Amazon deforestation.
  • Engineered a pioneering machine learning classifier to assess data quality from farm properties at scale, implementing feature engineering on geospatial and entity data that enabled scalable data integrity checks for the first time.
  • Lead UW‑Madison students in research projects, mentoring on machine learning and computer vision techniques.
  • Automated database workflows, reducing manual intervention and boosting team productivity.
  • Delivered private data analysis reports for stakeholders in Brazil and the USA.

Tools: Python, SQL, GitHub/GitLab, git, AWS, LangChain, PyTorch, YOLO, LLMs, Streamlit, machine learning libraries (e.g., scikit-learn), data visualization libraries, etc.

Industry experience

 
 
 
 
 
Schlumberger - World's largest offshore drilling company
Data Scientist
Schlumberger - World’s largest offshore drilling company
January 2023 – July 2024 Houston - TX, USA (remote)

I work developing end-to-end AI SaaS products to our internal customers.

Main activities:

  • Building predictive models using either statistical or machine learning approaches to assess the health of the company assets (tools)
  • Assessing the technical feasibility of new projects through data analysis

Deliveries:

  • Statistical models to assess the asset health
  • Machine learning models to predict the asset health for the upcoming usage
  • A custom-trained OCR model to extract dimensions of engineering drawings

Quick facts:

  • As result of my first project, our work has been accepted to be published as a scientific paper at OnePetro.
  • I led a innovation proposal using AI, and we ranked in the top 12 out of almost 400 ideas worldwide. My colleague and I gave the final pitch to the CEOs.
  • I work in one of the most diverse teams in our company, interacting with people from US, India, Europe, and South America in my daily routine.

 

Tools: Dataiku, GCP, SQL, Python, Dash, Streamlit, machine learning libraries (e.g., scikit-learn), data visualization libraries.

 
 
 
 
 
DNC  - Edtech
Data Science Consultor (Education)
DNC - Edtech
October 2021 – July 2024 São Paulo - SP, Brazil (remote)

I have worked in multiple roles: facilitator, mentor, and consultant/instructor.

As a consultant/instructor, I prepared the course modules and recorded classes. I recorded four modules: statistics, data cleaning/wrangling, clustering, and model deployment. I have also devised data science activities about descriptive and inferential statistics, unsupervised machine learning models, MLFlow, big data with PySpark, among others.

As a mentor, I assessed student reports and addressed their questions through Q&A sessions, aiding in academic and real-world projects.

As a facilitator, I participated in data science activities from exploratory analysis to model deployment. I also prepared some of those activities.

 
 
 
 
 
Ambev Tech  - World's largest beer brewer company
Data Scientist
Ambev Tech - World’s largest beer brewer company
May 2022 – December 2022 São Paulo - SP, Brazil (remote)

Squad: Revenue Management

Activities / Deliverables:

  • Rule-based automation of pipelines for price engines using PySpark (big data)
  • Statistical model to identify price changepoints for several SKU categories
  • Monthly time series modeling from of selling volume using state-of-the-art forecast and hierarchical reconciliation methods
  • Training for data scientists

 

Tools: • Python • PySpark • MLFlow • Scikit-learn • Pycaret • Statsmodels • Forecasting frameworks (Prophet, NeuralProphet, StatsForecast)

 
 
 
 
 
Remessa Online  - Fintech
Data Scientist
Remessa Online - Fintech
October 2021 – April 2022 São Paulo - SP, Brazil (remote)

Activities:

  • Advanced statistical modeling: data exploratory analysis, hypothesis testing, time series forecasting, predictive analyses (e.g., regression and classification), customer and product segmentation, etc.

Deliverables:

  • Multiple time series forecasting for customer segments (weekly and monthly)
  • Model for predicting the probability of the customer recurrence
  • In-depth study of churn analysis through inferential statistics

 

Tools: • Python • PySpark • MLFlow • Scikit-learn • Pycaret • Catboost • Prophet • AWS • GitHub • Data visualization libraries (Plotly, Seaborn, Matplotlib)

 
 
 
 
 
Eli Lilly
Safety Data Sciences Associate
Eli Lilly
June 2021 – October 2021 Indianopolis - IN, USA (remote)

Activities:

  • Queries and reports for worldwide company members

Academic experience

 
 
 
 
 
Universidade de São Paulo
MBA in Data Science & Analytics
Universidade de São Paulo
May 2021 – August 2023 São Paulo - SP, Brazil

Grade: 10

  • In‑depth study of machine learning models.
  • Developed an end‑to‑end hybrid ML model for churn prediction.
  • Project code repository
 
 
 
 
 
Universidade de São Paulo
PhD Researcher
Universidade de São Paulo
July 2016 – April 2021 São Paulo - SP, Brazil

Thesis: Identifying natural selection in Native American populations. Supported by: CAPES (2016 - 2018) and FAPESP (2018 - 2020)

Activities:

  • Data analysis, visualization, and scientific reporting of genetic data using R, Python, and bash scripting.
  • Application of non‑supervised algorithms (e.g., PCA), descriptive and inferential statistics.

Deliverables:

  • Internal R package with customized functions to facilitate multiple analyses
  • Three scientific papers published in international magazines
  • Thesis code repository
     

I presented our preliminar work in international conferences, including USA. I also took a internship of 3 months in Barcelona - Spain.

 
 
 
 
 
Universidade de São Paulo
Master in Science
Universidade de São Paulo
April 2014 – March 2016 São Paulo - SP, Brazil

Dissertation: Role of cellular prion protein and its ligand, stip1, in the adult neurogenesis. Supported by: CNPq (2014 - 2016)

Main techniques: primary cell culture, immunofluorescence, and hypothesis testing.

 
 
 
 
 
Universidade de São Paulo
Bachelor in Biological Science
Universidade de São Paulo
February 2011 – December 2013 São Paulo - SP, Brazil

Monography: Role of the interaction between the cellular prion protein and its ligand, STI1, in the biology of neural precursors from the murine adult brain. Supported by: University scholarship (2011 - 2012) and FAPESP (2012 - 2013)

Honor & Awards:

  • Best academic performance
  • Professor Bertha Lange de Morretes’ Award

Publications

Contact

Feel free to contact me through this form: