Skip to content

Commit

Permalink
Data Science Updates per PR 628 (#745)
Browse files Browse the repository at this point in the history
* updated english readme

* moved data science file to readme, added NumPy

Co-authored-by: niv-png <>
  • Loading branch information
WalterMarch authored Feb 26, 2024
1 parent 30f2d9c commit e8ed9a8
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 0 deletions.
60 changes: 60 additions & 0 deletions Data Science/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Data Science Overview

Data Science is a multidisciplinary field that focuses on extracting valuable insights from data. It combines expertise from computer science, statistics, and domain knowledge to turn raw data into actionable information. Here, we'll cover some key components of Data Science, including data analysis, data visualization, statistical analysis, and popular data science libraries like Pandas and Matplotlib.

## Key Components of Data Science

### Data Collection and Acquisition

Data Science projects start with collecting and acquiring data from various sources, such as databases, APIs, sensors, and web scraping.

### Data Cleaning and Preprocessing

Raw data is often messy and requires cleaning and preprocessing. This involves handling missing values, outliers, and formatting issues.

### Exploratory Data Analysis (EDA)

EDA involves using statistical and visualization techniques to understand the data's characteristics, distributions, correlations, and potential patterns.

### Feature Engineering

Feature engineering is the process of creating new features or modifying existing ones to improve model performance.

### Machine Learning and Modeling

Data Scientists build predictive models using machine learning algorithms. This involves splitting the data into training and testing sets, model selection, training, and evaluation.

## Data Visualization

Data visualization is crucial for communicating insights effectively. It uses charts, graphs, and plots to represent data visually. Common types of data visualizations include:

- Bar Charts
- Line Charts
- Scatter Plots
- Histograms
- Heatmaps
- Box Plots

## Statistical Analysis

Statistical analysis is fundamental in Data Science and includes:

- Descriptive Statistics: Measures like mean, median, mode, variance, and standard deviation.
- Inferential Statistics: Techniques like hypothesis testing and confidence intervals.
- Regression Analysis: Predicting a continuous dependent variable based on independent variables.
- Hypothesis Testing: Making decisions based on sample data.

## Popular Data Science Libraries

### Python

As one of the most popular languages used in Data Science, Python has several popular libraries. Among them are:

- [Matplotlib](https://github.com/matplotlib/matplotlib)
- Matplotlib can create static, animated, and interactive plots and visualizations. It offers a wide range of customizable plot types and styles for data visualization.

- [NumPy](https://github.com/numpy/numpy)
- NumPy offers a variety of high-level mathematical functions as well as adding support for multi-dimensional arrays and matrices. It is so essential, it is often a dependency in other libraries.

- [Pandas](https://github.com/pandas-dev/pandas)
- Pandas helps with data manipulation and analysis. It provides data structures like DataFrames and Series; making it easy to clean, explore, and transform data.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ If you're interested in contributing to this project, please take a moment to re
- [Algorithms](#algorithms)
- [Alan Turing](#alan-turing)
- [Software Engineering](#software-engineering)
- [Data Science](#data-science)
- [Integrated Circuits](#integrated-circuits)
- [Object Oriented Programming](#object-oriented-programming)
- [Functional Programming](#functional-programming)
Expand Down Expand Up @@ -377,6 +378,9 @@ The software engineering process involves several phases, including requirements
- Economics: In this sector, software engineering helps you estimate resources and control costs. A computing system must be developed, and data should be maintained regularly within a given budget.
- System Engineering: Most software is a component of a much larger system. For example, the software in an Industry monitoring system or the flight software on an airplane. Software engineering methods should be applied to the study of this type of system.

## [Data Science](Data%20Science/readme.md)

Data Science extracts valuable insights from often messy data by applying computer science, statistics, and knowledge of the domain under consideration. Examples of the use of data science include deriving customer sentiment from call records or purchase recommendation systems derived from sales records.

## [Integrated Circuits](Integrated%20Circuits/readme.md)
An integrated circuit or monolithic integrated circuit (also referred to as an IC, a chip, or a microchip) is a set of electronic circuits on one small flat piece (or "chip") of semiconductor material, usually silicon. Many tiny MOSFETs (metal–oxide–semiconductor field-effect transistors) integrate into a small chip. This results in circuits that are orders of magnitude smaller, faster, and less expensive than those constructed of discrete electronic components. The IC's mass production capability, reliability, and building-block approach to integrated circuit design have ensured the rapid adoption of standardized ICs in place of discrete transistors. ICs are now used in virtually all electronic equipment and have revolutionized the world of electronics. Computers, mobile phones, and other home appliances are now inextricable parts of the structure of modern societies, made possible by the small size and low cost of ICs such as modern computer processors and microcontrollers.
Expand Down

0 comments on commit e8ed9a8

Please sign in to comment.