How to become a Data Analyst – Part 1
I have defines a course-path for anyone who wants to become a data analyst. There are several parts to this journey which involves Terminology, Definitions, Videos and other material.
Data is a collection of facts.
Data analytics is the science of data. Data analysis and Eco-system of data fits in data analytic.
Data analysis: the collection, transformation and organization of data in order to draw conclusions, make predictions and drive informed decision making.
Data analyst: An explorer, detective and an artist.
They answer the questions using data by creating insights.
Data science encompasses three disciplines:
creating new ways of modeling and understanding the unknown by using raw
data. e.g., they create new questions using data.
- Machine learning
Subject matter experts: they have the ability to look at the results of data analysis and identify any inconsistencies, make sense of gray areas and eventually validate choices being made.
Analyst use the following step by step process:
1. Ask: questions and define a problem
- Define the problem
- Understand the stakeholder expectation
2. Prepare: data by collecting and storing the information
3. Process: data by cleaning and checking information by removing outliers.
4. Analyze: data to find patterns
5. Share: data with audience. Like using visualization
6. Act: on the data
Stakeholder is the one who has invested time and resources and is interested in the outcome.
Analytical Skills are the qualities and characteristics associated with solving problems using facts. Analytical skills are:
Curiosity : desire to know more
Understanding context : understanding where information fits in the big picture.
Data strategy : managing people, processes and tools
Technical mindset is to break things down into smaller steps and work with them in an orderly or logical way.
Five aspects of Analytical thinking:
5. Big-picture and detail-oriented thinking
Gap analysis is a method for examining and evaluating how a process works currently in order to get where you want to be in the future.
Data Life Cycle:
- Plan: Decide what kind of data is needed, how it will be managed, and who will be responsible for it.
- Capture: Collect or bring in data from a variety of different sources.
- Manage: Care for and maintain the data. This includes determining how and where it is stored and the tools used to do so.
- Analyze: Use the data to solve problems, make decisions, and support business goals.
- Archive: Keep relevant data stored for long-term and future reference.
- Destroy: Remove data from storage and delete any shared copies of the data.
Tools used by data Analyst:
2. Query language for databases
3. Visualization tools
Common Problems types:
1. Making predictions : how things maybe in the future
2. Categorizing things : making different groups or clusters
3. Spotting something unusual
4. identifying theme : take categories to high context
5. Discovering connections : using data and insights to address similar challenges
6. Finding patterns
Examples of SMART questions
Here’s an example that breaks down the thought process of turning a problem question into one or more SMART questions using the SMART method: What features do people look for when buying a new car?
- Specific: Does the question focus on a particular car feature?
- Measurable: Does the question include a feature rating system?
- Action-oriented: Does the question influence creation of different or new feature packages?
- Relevant: Does the question identify which features make or break a potential car purchase?
- Time-bound: Does the question validate data on the most popular features from the last three years?
Avoid these types of questions:
Leading questions: questions that only have a particular response
Closed-ended questions: questions that ask for a one-word or brief response only
Vague questions: questions that aren’t specific or don’t provide context
Data inspired decisions explored different data sources to find out what they have in common.
Quantitative Data is specific and objective measure of numerical facts. Can be discrete and continuous
Qualitative Data is subjective or explanatory measures of qualities and characteristics. Is usually listed as name, category and description. Can be nominal( no order) and ordinal ( with order).
Answer why questions
There are two data presentation tools:
1. Reports : sheets can be used
Static collection of data given to stakeholders periodically.
a. high-level historical data
b. easy to design
c. clean and sorted data
a. continual maintenance
b. less visually appealing
2. Dashboards : Tableau can be used
Monitors live, incoming data
Pivot Table is a data summarization tool that is used in data processing.
Metrics are quantifiable data types used for measurement.
Big Data has three Vs: Volume, Variety and Velocity.
Structured thinking is the process of recognizing the current problem or situation, organizing available information, revealing gaps and opportunities and identifying the option.
How data is collected
- observations (often used by scientists)
- cookies (observe online activity)
First party data is data collected by an individual or group using their own resources.
Second party data is data collected by a group directly from its audience and then sold.
Third party data is data collected from outside sources who did not collect it directly.
Population is all possible values in the data set. Sample is part of population that is representative of population.
Data modeling is the process of creating diagrams that visually represent how data is organized and structured. These visual representations are called data models. There are three types of data models,
- Conceptual data modeling : gives a high-level view of the data structure, such as how data interacts across an organization
- logical data modeling : Focuses on the technical details of a database such as relationships, attributes, and entities.
- Physical data modeling : depicts how a database operates.
Wide data is a data in which every data subject has a single row with multiple columns. All the information about the that subject can be found in that single row.
Long data is a data in which each category has separate column.
Data transformation is the process of changing the data’s format, structure, or values.
OR is used to expand the search on google.
Bias in data:
- sample bias
- observer bias
- interpretation bias
- confirmation bias
Data is not own by the organization who spend time and resources to collect and transform it but data is actually owned by the individuals who provide the data and they have the control over the usage and processing of data how may that be.
Data anonymization is the process of protecting people’s private or sensitive data by eliminating that kind of information. De-identification, is a process used to wipe data clean of all personally identifying information.