Janssen Research & Development LLC is a Johnson & Johnson company. We are recruiting a Senior Data Scientist, Bioinformatician to join the Data Solutions, Privacy and Ethics department. The primary location for this position is Titusville, NJ; Spring House, Pennsylvania; Boston, MA; South San Francisco, CA; San Diego, CA but considerations will be given for alternate locations.
Janssen develops treatments that improve the health and lifestyles of people worldwide. Research and Development areas encompass Oncology, Cardiovascular and Metabolic disorders, Immunology, Neuroscience, and Infectious diseases. Our goal is to help people live longer, healthier lives. We have produced and marketed many first-in-class prescription medications and are poised to serve the broad needs of the healthcare market - from patients to practitioners and from clinics to hospitals. To learn more about Janssen, one of the Pharmaceutical Companies of Johnson & Johnson, visit https://www.janssen.com/.
As a Pipeline Data Engineer, the candidate will play a key role in developing, maintaining, and optimizing omics-focused data processing pipelines underpinned by Nextflow Tower while working with the broader R&D Data Science community to build efficiencies in the development of paired real world data (RWD), phenotypic cohort definitions. Primary responsibility will be to ensure the efficient and reliable processing of large-scale RWD, genomics and bioinformatic datasets, while collaborating with data scientists, bioinformaticians and engineers to develop innovative solutions for data processing and analysis.
In This Role, You Will:
- Develop, maintain, and optimize Nextflow workflows and data pipelines to support the processing and analysis of large-scale genomics and bioinformatics datasets.
- Actively contribute to the building of and development of RWD phenotypic libraries / pipelines.
- Collaborate with data scientists, bioinformaticians, and software engineers to design and implement scalable and efficient data processing solutions.
- Monitor and troubleshoot data pipeline performance, identifying bottlenecks and optimizing workflows to increase efficiency and minimize costs.
- Develop and implement standard processes for data quality, data validation, and data provenance.
- Integrate new data sources, technologies, and tools into our Nextflow Tower-based infrastructure.
- Work closely with the IT and DevOps teams to ensure the security, availability, and performance of our Nextflow Tower environment.
- Build and maintain technical documentation for data pipelines and workflows, including user guides, tutorials, and API documentation.
- Stay up-to-date with the latest developments in the Nextflow ecosystem and seek out opportunities that maynew features and technologies.
- Bachelor’s degree in Computer Science, Bioinformatics, or a related field (Master’s degree preferred).
- 5+years of work experience. Broader consideration given to candidates with graduate degrees.
- Strong experience with pipeline management tools e.g, Nextflow, SageMaker including the development, optimization, and deployment of complex workflows.
- Familiarity with genomic data formats (e.g., FASTQ, BAM, VCF) and bioinformatics tools (e.g., BWA, GATK, STAR).
- Proficiency in one or more programming languages (e.g., Python, Java, R).
- Experience with cloud-based data processing platforms (e.g., AWS, GCP, or Azure) and containerization technologies (e.g., Docker, Singularity).
- Experience working with real world data (EHR, Medical Claims, Health Survey, Registry)
- Experience working with and executing scripts against large databases such as Amazon Redshift, Snowflake, PostgreSQL.
- Experience with RShiny, Flask and HTML
- Ability to translate discussions into actionable user requirements and project plans.
- Capable of technical execution of a project inclusive of managing external third parties / vendors.
- Excellent problem-solving and troubleshooting skills, with a focus on optimizing performance and minimizing costs.
- Excellent communication, interpersonal, and written skills.
- Knowledge of medical terminologies (e.g. ICD, NDC, SNOMED, MedDRA, LOINC, CPT)
- Working knowledge of healthcare data standards such as HL7, FHIR, OMOP, SDTM and/or ADaM.
- Understanding of data engineering principles, including data modeling, ETL, and data validation.
- Capable of drafting and/or contributing to peer reviewed publications, patents, and external presentations.
Expected Salary: $113,000 - 170,000