What do data scientists get hired to do?

What do data scientists get hired to do?

You have started to get interested in data science recently, and always find it interesting spending time thinking about the best way to analyze your data. Your background isn’t computer science, though you’ve studied lots of statistics and have may have every run a few experiments. You do a bit of programming and are decent at math. So you find yourself pondering what you can do after you finish your current project and would like to change fields for a different number of reasons.

You’re considering getting a data science job and want to harness the strengths developed during your graduate study. You want to take advantage of your intellectual curiosity. You want to maintain the rigor of experimental design you took part in. And you enjoy working with others and debating the nuanced interpretation of results. Lastly, you’ve found that you enjoy math and programming and feel that you could make the transition if you just knew how to get started.

What do data scientists get hired to do

One important step to properly get started in a new field is to understand what it is that the job / role actually does. Unfortunately (or fortunately), the field (data science) is so big right now that what matters differs drastically from job to job. Some data scientists mostly build data plumbing. Some data scientists mostly build data cleaning services. Some data scientists do academic-style research. Some data scientists do a mix of all of the above to varying degrees. Some roles want natural language processing (NLP) skills, others want MapReduce experience, others want Hadoop, while others want Pig, and some others still want Spark. You’ll hear about people using SPSS, others using Incanter, others using Python, others using R, others using Weka, and yet another group using Scala. It’s enough to demotivate you as you don’t want to learn something totally irrelevant to your area of specialization or data science job you’d find interesting.

When in doubt, look at the data

This should come as no surprise to you, but what’s important is to look at the data when you’re not sure. To better understand what data scientists get hired to do, here’s what we’re going to do. We’re going to look at Indeed (a career website) and look at the first 3 pages of search results for the keyword "data scientist". This will cover 30 job postings. We will ignore the job advertisements and focus on the ones listed. For each listing, we’ll go into it and figure out what the data scientist would be doing when hired. Then we’ll put together a list of tasks that appeared. Note, the results may vary when you are reading this, as this search is being done today (December 16, 2014).

Note:, though we are only going to do Indeed, here are a list of recommended job websites you should take a look at...

Data science job responsibilities

The URL that we’ll use is the following one => http://www.indeed.com/jobs?q=%22data+scientist%22&l= and the search will be for "data scientist". Note, this search is based on "united states" as the country because my IP and cookies are all based in the United States.

Here is are the responsibilities and accountabilities:

  1. Able to advice senior management in clear language about the implications of their work for the organization
  2. Acquire, clean, and structure data from multiple sources
  3. Actively be involved in growing the group and streamlining the people and processes to increase efficiency and effectiveness.
  4. Actively engage in hands on development of offerings, sales kits and other marketing materials.
  5. Analyze, report, and present on findings and behavioral trends
  6. Assist in the implementation of the results of modeling and analysis through partnership with Clinical Operations, IT and other organizational entities
  7. Assists business with casual inferences & observations with finding patterns , relationships in data
  8. Author white papers and technical blog posts
  9. Become a local marketing subject matter expert and adapt to our fast-paced environment
  10. Build a super-fast person search engine with billions of documents
  11. Build and deploy data analysis systems for large data sets
  12. Collaborate with business and technical teams to formulate the problem, recommend a solution approach and design a data architecture
  13. Collect and manage large data sets to perform complex data analysis, communicate the results and their implications to the business stakeholders
  14. Conceive of and develop tools to minimize risk of experimentation
  15. Constantly develop professional knowledge and skills
  16. Constantly explore new opportunities, niches and trends to identify and develop new value-add offerings
  17. Consult internal groups in the use and integration of machine learning
  18. Coordinate with the Information Systems team in gathering data and implementing solutions
  19. Customer churn scoring
  20. Customer conversion scoring and optimization
  21. Define and manage collaboration initiatives with outside research partners from business and academia
  22. Define and set up internal and external cloud computing environment including Hadoop clusters, parallel computing software, and applications and algorithms to process large amounts of data.
  23. Deploy algorithms to enable specific business applications
  24. Design and develop automated data analysis frameworks for systematic analysis for advanced data applications
  25. Design and implementation of pre-processing and warehousing pipelines for biomedical data
  26. Design experiments to answer targeted questions and communicate informed conclusions and recommendations
  27. Design experiments to identify casual factors
  28. Design, build and deploy a large scale Record Linkage system to find relationships among 7+ billion person records
  29. Determine the best analysis methods leveraging statistical and analytical best practices
  30. Develop algorithms for optimal device control
  31. Develop analysis plans and implement appropriate modeling techniques to answer complex business questions
  32. Develop and execute Actionable segmentation
  33. Develop and execute Advanced Predictive Modeling
  34. Develop and execute Demand Forecasting
  35. Develop and execute Digital Intelligence
  36. Develop and execute end to end analytics solutions to drive profitable customer growth
  37. Develop and execute Marketing Performance
  38. Develop and execute Pricing Optimization
  39. Develop and execute ROI Enhancement
  40. Develop and execute Single View of Customer
  41. Develop and execute Web Analytics
  42. Develop and monitor health-related outcomes and other metrics to support the monitoring and evaluation of current and newly emerging products
  43. Develop and optimize real-time data-driven algorithms that optimize viewer quality of experience across devices, networks, and content
  44. Develop metrics and prototypes that can be used to drive business decisions
  45. Develop new algorithms and methods for optimizing revenue and key performance metrics.
  46. Develop one off experiments for large company initiatives and design the statistical analysis of the results
  47. Develop predictive and descriptive models using advanced procedures
  48. Develop predictive and prescriptive statistical or behavioral models
  49. Develop predictive models for important business- and people-centered outcomes
  50. Develop product offerings through careful consideration of business value and data analyses
  51. Develop roadmap for algorithmic bidding platform to optimize digital marketing investments
  52. Develop software, algorithms and applications to apply mathematics to data, perform large scale experimentation and build data driven apps to translate data into intelligence, solve a variety of business problems and enable business strategy
  53. Develop, test and validate algorithms
  54. Development of data visualization and analysis tools
  55. Drive change by closely collaborating with internal stakeholders in Data Science, Website Engineering and Category Management
  56. Drive the collection of new data and the refinement of existing data sources
  57. Enhance data analytics teams understanding of machine learning techniques and algorithms through consulting, training, and seminars
  58. Evaluate and optimize the people search engine
  59. Experience in dealing with real world data in one or more of the following areas: machine learning, data science, probabilistic inference and/or computational statistics
  60. Explore billions of records, research and develop predictive models and optimization algorithms for ad targeting and bidding on Ad Exchanges
  61. Explore high-level, undefined ideas and business problems using unstructured, raw data
  62. Extract insights and actionable recommendations from large volumes of data
  63. Formulate business problems, translate them into data science projects and provide solution approaches
  64. Grow our real-time internal data intelligence API
  65. Grow our service provider base by identifying & improving recruiting & retention drivers
  66. Help build and manage US-based and EU consulting practices
  67. Help the business understand and evaluate data science use-cases appropriate for their businesses
  68. Ideation, prototyping and creation of intellectual property
  69. Identify high impact areas for novel proprietary algorithms
  70. Identify methods that allow continuous and automated statistical testing to enhance the predictability of deployed models
  71. Identify resources and courses to add to internal education program
  72. Identify state of the art algorithms to perform core data science functions, including machine learning, optimization, and statistical analysis
  73. Implement these models and algorithms, leveraging grid computing on Hadoop and Hive
  74. implementation of data-driven algorithms that enhance the performance of our system.
  75. Improve internal processes and tools to increase efficiency and spur future product innovation
  76. Improve service reliability & quality by identifying the underlying drivers of issues
  77. Inspire the adoption of advanced analytics and data science across different teams and functions
  78. Institute rigorous test-and-learn methodology to achieve desired results
  79. Integrate algorithms within current enterprise analytics platforms to support business intelligence applications
  80. Integrate research outcomes within internal capabilities
  81. Interpret data and communicate complex findings to leaders in HR and across the business
  82. Isolate the incremental financial impact of the business question under investigation
  83. Leads scoring
  84. Leverage data mining and machine learning approaches to model and predict end user behavior
  85. Maintain an engaged network of scholars and practitioners to maximize learning and idea exchange
  86. Maintain familiarity with current trends in health behavior research
  87. Maintain transparency by partnering with others to document and communicate results of analyses as well as the processes used to develop and implement analyses and predictive analytics.
  88. Management of interdisciplinary teams on individual projects
  89. Manipulate and analyze complex, high-volume, high-dimensionality data from varying sources
  90. Manipulate and integrate a variety of data sources in the data preparation phase
  91. Marketing mix modeling and planning
  92. Media attribution
  93. Mine experiment data for issues and unidentified wins, then automate and develop tooling around that
  94. Mine our vast customer data to form hypotheses, deploy test & drive metrics every day.
  95. Need to be able to link and mash up distinctive data sets to discover new insights
  96. Organize and participate in internal and external seminars
  97. Organize educational seminars
  98. Participate (and lead) pre-sales activities related to consulting opportunities
  99. Participate in building current customer base awareness
  100. Participate in building target market awareness ... by contributing to marketing initiatives including social and media presence – presenting on events, publishing materials in related magazines and web resources, blogging, etc
  101. Participate in cutting edge statistical analysis and predictive analytics
  102. Participation in manuscript preparation
  103. Partner with premier digital marketing companies to understand and suggest new opportunities, and work to test those new opportunities in a quest for additional revenue and margin
  104. Provide exemplary Analytics consulting services fulfillment
  105. Provide internal consulting to answer key product questions and drive product decisions
  106. Publish in top-tier journals, file patent applications, and develop relevant applications that support the business
  107. Rapidly create, test and improve innovative bidding algorithm to drive revenue and ROI goals
  108. Real-time online media optimization
  109. Research, develop, and implement predictive algorithms for our real time experimentation system
  110. responsible for the categorization and optimization technologies that are the foundation components of our platform
  111. Sales operation analytics
  112. Scouting of novel technologies related to distributed architectures
  113. Spur future product innovation
  114. Summarize and present conclusions and solutions
  115. Supervision of graduate students
  116. Support relationships between development and client services so that all are appropriately aligned to meet team objectives
  117. Support Senior Strategist in planning, conducting and synthesizing research
  118. Technology and business model evaluation for automotive applicability
  119. Train and develop the less experienced data scientists and analysts on the team
  120. Train internal staff on use and maintenance of resources
  121. Train, tune, and cross-validate a range of machine learning algorithms
  122. Transform data into insights to identify & quantify business opportunities
  123. Understanding of how a business and strategy works
  124. Use and/or create software tools to gain insights into underlying data
  125. Use SAS to build, implement, and regularly monitor the effectiveness of predictive models;
  126. Uses predictive modeling, statistics, Machine Learning, Data Mining, and other data analysis techniques to collect, explore, and extract insights from structure and unstructured data
  127. Work cross-functionally to establish reporting, instrumentation, and metric standards
  128. Work with a broad spectrum of decision makers to determine the goals and expected results to their business questions
  129. Work with a team of researchers on both theoretical and practical projects that will use his/her scientific, mathematical and computational skills
  130. Work with building energy scientists to analyze and extend the capabilities of company's physics-based energy simulation model
  131. Work with complex data from various sources
  132. Work with cross functional teams to deliver yearly financial goals by implementing, managing, and communicating monetization programs
  133. Work with our Analysts and Marketing teams to understand client goals, and work with Engineering team to turn research into products
  134. Work with team members to collect data for ad-hoc and statistical analysis
  135. Work with team members to develop the appropriate analytical methods to apply to outcomes research and discovery
  136. Write research papers for internal audiences

Data scientists do everything and must do it well!

As you can see, a data science job can cover things from "Authoring white papers and technical blog posts" to doing "Real-time online media optimization". Just in 30 job postings (3 pages of indeed results), we were able to see 136 different responsibilities listed. Some are very similar and others are very very different. This is one of the fortunate or unfortunate things about the data science field at the moment, that it is so big right now that what matters and what you’d actually differs drastically from job to job.

So, when looking for a data science job, realize that some data scientists mostly build data plumbing, some data scientists mostly build data cleaning services, some data scientists do academic-style research, and some data scientists do a mix of all of the above to varying degrees. Which means that it’s well worth your time taking a few hours/days to go through the websites listed above to read through all of the job descriptions and responsibilities to see what sticks out and what you’d like to do. After all, the companies above are all hiring and are actively looking for people to take on those roles right now.

Get to it and good luck!

Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.