What is the real expectation from a data scientist to join the requirements gathering group effort and why should we have them in such an early stage - why is so important to us!
Do you remember when I talked about the dynamic features of entities and business rules, this is where the DIFFERENT VERSIONS OF TRUTH comes to play. Think about these three principle tasks considered for a data scientist during requirement gathering phase again:
- Collective ownership - Remember the data you’re gathering and/or analyzing later is going to be messy and difficult to work with*
- Consider testability
- Create several requirements (model) in parallel
Lets say, software SIMULATION system is a tool to make theses wishes to come true at best scenarios. Is that really help the truth! Software people know how the simulation systems work - they know the difference between a discrete and continues time models to be used in these models and we have mathematicians dealing with statistic sampling curve generated data to make a sense of the behaviour and status of a system but they don't know what does it mean when it comes to DIFFERENT VERSIONS OF TRUTH! That is technical scientific data analysis.
Lets say we don't have the luxury to have access to "the" simulation tools for each scenario needed to be analyzed to get the job done - this is beyond the time frame, budget and expertise needed for this program and it is very complicated - now what!
Data science is still nascent and ill-defined as a field. “Data scientist” is often used as a blanket title to describe jobs that are drastically different. In general, the point of view of most of the executives about data scientists are viewed as people who help others make data-driven decisions when others dispute that they should have a strong software engineering background with knowledge of machine learning methods! (k-nearest neighbors, random forests, ensemble methods, etc.)
Lets remember one of our dearest mathematician, astronomer, and geographer in Persian history, Muhammad ibn Musa al-Khwarizmi. The Khwarizm iconic royal symbol was definitely didn't show up by random on "IoAT readiness for Action" article in this blog!
People who interview to give someone a job as a data scientist usually ask some basic multivariable calculus or linear algebra questions, since they form the basis of a lot of machine learning techniques.
You may wonder why a data scientist would need to understand this stuff if there are a bunch of out of the box implementations in software packages available already. The answer is that at a certain point, it can become worth it for a data science team to build out their own implementations in house.
Understanding these concepts is most important at scenarios where the different versions of the truth is defined by the data and small improvements in predictive performance or algorithm optimization can lead to huge wins for the scenario you are working to gather the different versions of the truth requirements.
What everyone is agreed upon about data scientist - is that - the visualizing and communicating data is incredibly important, especially where you are making data-driven decisions. I guess this means the importance of the data scientist role in requirement gathering and analysis at best! A data scientist is responsible for handling a lot of data logging, and potentially the development of data-driven scenarios.
To be continued in part 5 ...