Introduction

Data collection means gathering information to answer questions, make decisions, or understand something better.

Methods of Data Collection

Surveys

Surveys collect information by asking people questions.

Example: Asking classmates about their favourite ice cream flavour.

Good Survey Practices

  • Ask clear and simple questions
  • Keep the survey short
  • Use multiple-choice or rating questions
  • Keep answers anonymous
  • Test the survey first
  • Analyze results carefully

Questionnaires

Written forms with a set of questions.

Example: A school questionnaire asking students about activities.

Interviews

One-on-one discussions for detailed information.

Example: Interviewing a teacher about their experience.

Observations

Watching real situations to understand behavior.

Online Data Sources

Websites, databases, and digital tools.

Data Extraction

Data extraction means selecting and saving only the important information from a large amount of data.

Types of Data (Based on Storage)

Structured Data

Structured data is well-organized and easy to search. It is stored in rows and columns, like spreadsheets or databases.

Example:A table with student ID, name, class, date of birth, and fees.

Unstructured Data

Unstructured data has no fixed format. This data is harder to organize but still very useful.

Example:Emails, social media posts, images, videos, and text messages.

Data Visualization

Data visualization means showing data using pictures like charts and graphs. It helps us understand data easily by showing patterns and trends.

Importance

  • Makes data easy to understand
  • Saves time
  • Helps quickly see trends and comparisons
  • Easier than reading long lists of numbers

Data Pre-Processing & Analysis

Data Pre-Processing

Cleaning and organizing data before analysis.

Data Pre-Processing Techniques

1. Checking Data Quality

Ensure data is correct, complete, and up-to-date.

ExampleEnsuring every enrolled student has an accurate, up-to-date test score recorded.

2. Common Problems

  • Errors:Wrong data values i.e. Score recorded as 105 when maximum is 100
  • Outliers: Values that are unusually high or low compared to the rest of the data.i.e. One student scoring 5 when most students scored between 50 and 80.
  • Bias: The distortions that effect the accuracy of data. i.e. Using survey results from one school to represent all schools in a city.

Validation & Cleaning

Validation: Check that data is complete and correct.

Cleaning: Fix or remove wrong data, fill missing values, and handle outliers.

Data Analysis Techniques

  • Quantitative: Uses numbers and measurements to find patterns and trends.
  • Qualitative:Uses non-numerical data like text, images, and sounds to understand meanings and experiences.

Cloud Storage

Cloud Storage: Cloud storage allows data to be stored on the internet. It helps in saving files, sharing data, and accessing information from any device.

Benefits

  • Access from anywhere
  • Safe backups
  • Real-time sharing

Remote Access

Remote access allows you to use a computer or network from another location. You can open files or software even when you are not physically present.

Example: Google drive, One Drive are examples of remote access.

Data Backups

Data backups are copies of important files stored separately to prevent data loss. They protect data from deletion, system failure, or viruses.

Example: Saving a school project on Google Drive or a USB.

Automatic Backups

Devices can be set to automatically back up data to cloud services like OneDrive.

MCQs

1. What is data collection?

  • A) Data deletion
  • B) Gathering information
  • C) Data storage
  • D) Data visualization

Answer: B) Gathering information

2. Which method involves one-on-one discussion?

  • A) Survey
  • B) Observation
  • C) Interview
  • D) Questionnaire

Answer: C) Interview

3. Data stored in rows and columns is called:

  • A) Unstructured Data
  • B) Raw Data
  • C) Structured Data
  • D) Visual Data

Answer: C) Structured Data

4. Which of the following is an example of unstructured data?

  • A) Spreadsheet
  • B) Database table
  • C) Image
  • D) CSV file

Answer: C) Image

5. Data visualization helps to:

  • A) Hide data
  • B) Make data harder
  • C) Understand trends easily
  • D) Delete data

Answer: C) Understand trends easily

FAQs

What is structured data?
Data that is well organized in rows and columns.
Why is data pre-processing important?
It improves data quality before analysis.