Research Data Scientist
Mass General Brigham
Mass General Brigham relies on a wide range of professionals, including doctors, nurses, business people, tech experts, researchers, and systems analysts to advance our mission. As a not-for-profit, we support patient care, research, teaching, and community service, striving to provide exceptional care. We believe that high-performing teams drive groundbreaking medical discoveries and invite all applicants to join us and experience what it means to be part of Mass General Brigham.
Our research project seeks to decrease the resource intensity of obtaining data from electronic medical records (EMR) by developing an open-source large language model (LLM) for natural language querying. This role is essential for the project's success, as it requires a data science professional with specialized expertise in natural language processing (NLP) and machine learning (ML). The data scientist will be responsible for critical tasks including data acquisition from unstructured clinical documentation, designing and training the generative LLM, and evaluating model performance. Given the project's grant-based, two-year timeline, a fixed-term position is required to support the specific aims outlined in the research plan.
Job Summary
SummaryResponsible for analyzing data, uncovering the underlying data patterns and logic, and developing data-driven applications. They will work towards developing solutions for the entire problem-solving cycle: find/prioritize the problems, research the best algorithms to solve the problems, design robust, practical solvers, and implement them.
Does this position require Patient Care?
No
Essential Functions
-Analyze complex and high-dimensional clinical and operational data for dependencies, patterns, outliers, inaccuracies, and validity.
-Apply knowledge of statistics, machine learning, programming, data modeling, simulation, and/or advanced mathematics to recognize patterns, identify opportunities, pose business questions, and make valuable discoveries.
-Contribute to the design and evaluation of optimal data-driven metrics (such as physician/facility performance criteria, bottleneck metrics, productivity limits, processing delay/error reports, etc.).
-Use a flexible, analytical approach to design, develop, and evaluate predictive models and advanced algorithms that lead to optimal value extraction from the data.
-Generate and test hypotheses and analyze and interpret the results.
Qualifications
Education
Bachelor's Degree Related Field of Study required or Master's Degree Related Field of Study preferred
Can this role accept experience in lieu of a degree?
Yes
Licenses and Credentials
Experience
Experience working in data science-type positions and with large data sets 2-3 years preferred
Knowledge, Skills and Abilities
- Extensive knowledge of natural language processing (NLP) and large language models (LLMs).
- Proficiency with deep learning frameworks such as PyTorch or TensorFlow.
- Ability to create reports and dashboards with Tableau.
- Skilled in data analysis with Python and SQL.
- Knowledge of statistics.
- Knowledge of machine learning.
- Preferred SQL database management and administrative knowledge.
- Practiced knowledge of numerical optimization algorithms.
Additional Job Details (if applicable)
Remote Type
Work Location
Scheduled Weekly Hours
Employee Type
Work Shift
Pay Range
$73,798.40 - $107,400.80/Annual
Grade
6
EEO Statement:
Mass General Brigham Competency Framework
At Mass General Brigham, our competency framework defines what effective leadership “looks like” by specifying which behaviors are most critical for successful performance at each job level. The framework is comprised of ten competencies (half People-Focused, half Performance-Focused) and are defined by observable and measurable skills and behaviors that contribute to workplace effectiveness and career success. These competencies are used to evaluate performance, make hiring decisions, identify development needs, mobilize employees across our system, and establish a strong talent pipeline.