Choose your career as Data Scientist to complete your dreams !!!!

 About Blog:

In my previous blog you get idea about the basic Python language. Today I am going to introduce you all about newly arrived work profile “Data Scientist”. As scientist do many research activities on the available knowledge in the field and discover the new innovative ideas, patterns and processes to make our daily activities easier.

This article helps you to think about this new job profile and get knowledge about How they work, What different task they carried out, what process they follow, what basic education and qualification required to become Data scientist.

Data Science process flow:

The data scientist carried out a standard process to get the progressive decision to the company by understanding the business need and processing the data by applying suitable data model as mentioned below stages.

Stage 1: Business Understanding:

When you meet to your client or stakeholder and started to ask relevant questions about the product or services. Based on this conversation data scientist defines objectives for the problem/ queries that need to tackled and best solution to be provided to client.

Stage 2: Data Acquisition:

As per defined objectives in previous stage data scientist need to collect the relevant data. This data can be collected from different sources like Social media Posts, web servers, databases, maintained logs, transactions, email communication, online repository, API’s etc.

Image credit/source: www.google.com (respective page)

This activity is also known as data mining process, which is finding the necessary data from large databases.

Stage 3: Data Preparation:

This stage having two steps as data cleaning and data transformation. Data cleaning task contains removing of irrelevant data stored in database repository i.e. inconsistent data types, Misplaced attributes as well as missed and duplicate values. With the help of different techniques this can be fixed and handled the missing data.



                                                                                                                            Image credit/source: www.google.com (respective page)

The data transformation activity can be completed using different software’s like talend, informatica etc.

                                                                                                                        Image credit/source: www.google.com (respective page)

Stage 4: Exploratory Data Analysis:

This stage contains defining and refining of data after transformation process. As per the need of the business requirement proper features are selected and variables that will be used in model development are finalized. This involves exploratory graphs, different terminologies and Numerical summaries data task. This process can be done with the help of R programming language.

                                                                                                                                Image credit/source: www.google.com (respective page)

Stage 5: Feature Engineering:

This is core activity in data science life cycle. Here important features identified and constructed more meaningful ones using the raw data that we have. Most popular machine learning algorithms like KNN, Naïve Bayes, Decision tree etc. are used to model data in proper form. Training data set were trained and it tested with sample data set, which will help to select best performing model. This model can be developed in programing languages like Python, R and SAS.

 

Stage 6: Data Modeling:

Data modeling have three types like Logical data modeling, Physical data modeling and conceptual data model.

In this stage data models created for the data stored in the database. With the help of this data model we can enforces business rules, regulatory compliances and policies on the data. Data model also ensure consistency in naming conventions, default values, semantics while checking the quality of the data. This type of model are also represented using ER diagram, UML diagrams etc.

Stage 7: Data Visualization:

Sorted data can be represented in proper graphical form is nothing but the data visualization. Data viewed in different styles as bar chart, graph, pie charts, scatter format, line graph etc. sometimes these styles can be represented some relationship between different object and characteristics.

Data visualization carries knowledge of Art and science to decide what to display, how to display to the client to understand the workflow correctly.

This activity can be completed using different tools as Tableau, Power BI, QlikView etc.

                                                                                                                                   Image credit/source: www.google.com (respective page)

Conclusion:

Currently data science domain have job openings for fresher’s as well as for experienced person. Here I gave idea about the different activities performed by the data scientist so that you can start learning this concepts and understand the different software’s related to this process as mentioned. Images used in this blogs are downloaded from Google search.

Keep visiting my blog to read such an interesting topics and new technology. This definitely helps you to choose right career for bright future.

Thank you for reading my blog. Kindly give me opinion about this blog in comment box and share it with your friends.

5 Comments

Post a Comment
Previous Post Next Post