Saturday, August 1, 2020

Sort data to analyze it

Managing Your Data

What is a database?

Almost every data scientist will spend time working in a database, which is an organized collection of structured data in a computer system. (Remember, structured data is usually organized in a table format with rows and columns, like the following example.)

Last four digits of social security numberLast nameAge
6881Marshall23
0121Rodriguez19
5538Cho59
2972Parker33
3154Sawyer72
Most databases today are organized as relational databases, which are collections of multiple data sets or tables that link together.
While SQL is the underlying language that drives most work done in relational databases, there are many RDBMSs in which you can do that work. As you venture into this field, you’ll run into names like these:
  • MySQL
  • Microsoft Access
  • PostgreSQL
  • Oracle
  • IBM DB2
  • MongoDB

Choose the right tools to manage data

Where do you begin? There are dozens of useful data science tools and platforms! Here’s a list of some popular and open source platforms that you can use to begin your own data science journey.

R is a good place to start
R is a programming language and free software environment often used for statistical analysis and data science. Many would-be data scientists start with this tool or with one of the popular R interfaces, and there are hundreds of useful packages in R that help with data visualization such as ggplot2.
Python works for general purposes
Python is a popular, general-purpose programming language that can also be used for data science. Pair it with a library like pandas library and with a useful interface, and Python can help you create new insights and data visualizations.
MATLAB helps crunch numbers
MATLAB was built to focus on numerical computing. It is often used in higher education.
Apache Spark supports big data and machine learning
Apache Spark is a proprietary general-purpose framework that can be especially useful for extremely large data sets and the machine learning that uses them.

No comments:

Post a Comment