Employment Information
Web Data Platform team at Microsoft is looking for a highly skilled and motivated Senior Data Scientist to join our team focused on improving web data quality by identifying and mitigating junk URLs at scale. In this role, you will work on some of the largest and most complex datasets in the world, leveraging state-of-the-art machine learning techniques and cutting-edge technologies to enhance the quality of data ingested for search and other web-based services. Your work will directly impact millions of users by improving search relevance, content quality, and user experience.
Ideal candidates should be able to take a business or engineering problem from a PM or Engineering leader and translate it to a data science problem. This includes all the steps to identify and deeply understand potential data sources, conduct the appropriate analysis or modeling to reveal actionable insights, and then work with data or AI engineers to operationalize the metrics or solutions.
Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Basic Qualifications:
- Bachelor's or Master's degree in Computer Science, Data Science, Statistics, Mathematics, or a related field.
- 7+ years of experience in data science, machine learning, or a related field.
Preferred Qualifications:
- Expertise in Python, C#, or another programming language for building scalable solutions.
- Strong hands-on experience with big data technologies such as Apache Spark, Databricks, or Azure Data Lake.
- Proficiency in machine learning frameworks like PyTorch, TensorFlow, or scikit-learn.
- In-depth knowledge of algorithms for classification, clustering, and anomaly detection.
- Experience in working with web data, including techniques for crawling, parsing, and feature engineering on unstructured data.
- Familiarity with techniques for handling noisy or imbalanced datasets.
- Knowledge of search engines, URL patterns, or web-related NLP is a strong plus.
Soft Skills:
- Strong problem-solving and analytical skills.
- Excellent communication and stakeholder management abilities.
- A growth mindset with a passion for learning and innovation.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Web Data Analysis:
- Analyze massive-scale web data (from billions of URLs) to identify patterns and trends in junk or low-quality URLs.
- Design and implement robust feature extraction pipelines to characterize web content quality.
Machine Learning Models:
- Build and deploy scalable machine learning models to classify URLs as junk, spam, or relevant.
- Develop ensemble methods combining rules-based systems with supervised and unsupervised models.
Scalable Data Processing:
- Leverage distributed computing frameworks like Apache Spark or Azure Synapse to process and analyze large-scale datasets efficiently.
- Optimize data pipelines for performance, scalability, and maintainability.
Collaboration with Engineers:
- Partner with engineering teams to integrate ML models into production pipelines.
- Work closely with product managers and stakeholders to define success metrics and iterate on solutions.
Innovate and Research:
- Stay updated with the latest developments in machine learning, NLP, and web data analytics.
- Foster a culture of inclusivity and disciplined data and software engineering practices to deliver business value, guided by data.