Data Scientist (Python, LLM, NLP, GenAI)

Full time

Employment Information


Norstella is a group of prominent pharmaceutical solutions providers – Evaluate, MMIT, Panalgo, The Dedham Group, Citeline – that help clients navigate complexities at each step of the drug development life cycle, from pipeline to patient. For more information, please visit

Evaluate is a global company providing outstanding market intelligence services for the Pharmaceutical, Medical Device, Financial and Consulting sectors, through the Evaluate Pharma®, Evaluate Medtech®, Evaluate Omnium and Evaluate Vantage® online brands. Our international clients in Pharma and Biotech, Medtech, Banking and Consultancy regard Evaluate Pharma® as the industry’s gold standard for timely and accurate analysis of reported drug sales, consensus sales forecasts, R&D pipeline, markets and comprehensive company financials.

The team

In this role as a Data Scientist you will report into the Lead Data Scientist for Strategic Intelligence and Market Access, within the Data Science Department. You will design and deploy cutting-edge models to support the creation of new products. We have small multi-functional teams consisting of pharmaceutical industry experts, R&D, data engineering and data scientists to rapidly prototype new visualisations and interactive reports using both our existing and newly acquired datasets.

Scope of the role

In this role as a Data Scientist you will:

  • Apply your fundamental knowledge of machine learning, generative AI, and other DS techniques to identify opportunities and help shape product development
  • Use Python and AWS SageMaker to create knowledge generation algorithms applied to pharmaceutical products
  • Visualise and analyse the performance of your models with common python libraries
  • Work with data engineers and DevOps to deploy your models into production environments
  • Contribute to our growing in-house data science library of useful functions, data handlers, APIs and other modules

How you’ll succeed

Ultimately our goal is to smooth patient access to life-saving therapies. You will work with R&D pharma specialists to understand a problem which is hindering developing and releasing effective new pharma products which we believe we can help with. After understanding the problem you will conceptualise potential solutions; in the past our solutions have involved classical machine learning, fine-tuning large language models, or simply well-designed data transformations and business logic – we’re about elegant solutions to problems rather than the technology used.

After conceiving potential solution(s), you will work with data engineering to collate data from a wide variety of sources and develop your algorithm as a proof-of-concept. You will deliver indicative results from your PoC into datasets for visualisation and exploration by the broader multi-functional team. You will also perform code reviews with the data science team to explain your approach and its strengths and weaknesses.

After iterating the design with the multi-functional team as part of customer-led product development, you might convert your prototype into a full product. This will involve productionising your code to a high standard, containerisation, and deployment of your algorithm, usually as an API in AWS SageMaker. Over time you may revisit this product, re-evaluate its performance, and retrain/improve as required.

Essential Requirements

  • 4+ years of work related experience with Python and core data science libraries including pandas, numpy, sklearn, scipy, CatBoost, XGBoost and other similar libraries
  • Must have experience deploying code in cloud preferably AWS
  • Ability to design and iterate creative solutions in Jupyter notebooks
  • Ability to convert successful code into well-engineered packages with appropriate use of in-built classes, modules, high/mid/low level functions, and other python best practice
  • Excellent statistical knowledge especially in relation to training dataset weaknesses and DS model scoring
  • Ability to work with stakeholders to manage your projects independently
  • Ability to explain your technical decisions on a project to non-data scientists
  • Degree at Masters level or better in a STEM field such as Maths, Physics, Computer Science, Engineering, or equivalent practical experience

Nice to have

  • Experience with AWS especially in relation to ML workflows with SageMaker, serverless compute and storage such as S3 and Snowflake
  • Exposure to building products that leverage large language models and generative AI
  • Knowledge of the pharmaceutical industry, in particular the stages of pharmaceutical product development
  • Data visualisation skills with Matplotlib, Bokeh, Seaborn, Plotly or similar libraries


  • Medical and prescription drug benefits
  • Health savings accounts or flexible spending accounts
  • Dental plans and vision benefits
  • Basic life and AD&D Benefits
  • 401k retirement plan
  • Short- and Long-Term Disability
  • Education benefits
  • Paid parental leave
  • Paid time off

The expected base salary for this position ranges from $140,000 to $150,000. It is not typical for offers to be made at or near the top of the range. Salary offers are based on a wide range of factors including relevant skills, training, experience, education, and, where applicable, licensure or certifications obtained. Market and organizational factors are also considered. In addition to base salary and a competitive benefits package, successful candidates are eligible to receive a discretionary bonus.

Norstella is an equal opportunities employer and does not discriminate on the grounds of gender, sexual orientation, marital or civil partner status, pregnancy or maternity, gender reassignment, race, color, nationality, ethnic or national origin, religion or belief, disability or age. Our ethos is to respect and value people’s differences, to help everyone achieve more at work as well as in their personal lives so that they feel proud of the part they play in our success. We believe that all decisions about people at work should be based on the individual’s abilities, skills, performance and behavior and our business requirements. MMIT operates a zero tolerance policy to any form of discrimination, abuse or harassment.

Sometimes the best opportunities are hidden by self-doubt. We disqualify ourselves before we have the opportunity to be considered. Regardless of where you came from, how you identify, or the path that led you here- you are welcome. If you read this job description and feel passion and excitement, we’re just as excited about you.


Join our newsletter to get monthly updates on data science jobs.