Thursday, August 6, 2020


  •  A statement or expression is an instruction the computer will run or execute.
  • The value in the parentheses is called the argument.

  • A Semantic error is when your logic is wrong.



    Expressions describe a type of operation that computers perform.

    1. We can bind a string to another variable.
    2. It is helpful to think of string as a list or tuple.

    `We can treat the string as a sequence and perform sequence operations.

    We can also input a stride value as follows. The 2 indicates we select every second variable.

    1. we set the variable A to the following value.
    2. We apply the method "upper" and set it equal to "B“.

    1. The method find, finds sub-strings. The argument is the sub-string you would like
    2. to find. The output is the first index of the sequence.
    3. We can find the sub-string Jack. If the sub-string is not in the string, the
    4. output is negative one.

    MODULE 2

    lists and tuples, these are called compound data types
    Tuples Tuples are an ordered sequence.

    Saturday, August 1, 2020

    Sort data to analyze it

    Managing Your Data

    What is a database?

    Almost every data scientist will spend time working in a database, which is an organized collection of structured data in a computer system. (Remember, structured data is usually organized in a table format with rows and columns, like the following example.)

    Last four digits of social security numberLast nameAge
    Most databases today are organized as relational databases, which are collections of multiple data sets or tables that link together.
    While SQL is the underlying language that drives most work done in relational databases, there are many RDBMSs in which you can do that work. As you venture into this field, you’ll run into names like these:
    • MySQL
    • Microsoft Access
    • PostgreSQL
    • Oracle
    • IBM DB2
    • MongoDB

    Choose the right tools to manage data

    Where do you begin? There are dozens of useful data science tools and platforms! Here’s a list of some popular and open source platforms that you can use to begin your own data science journey.

    R is a good place to start
    R is a programming language and free software environment often used for statistical analysis and data science. Many would-be data scientists start with this tool or with one of the popular R interfaces, and there are hundreds of useful packages in R that help with data visualization such as ggplot2.
    Python works for general purposes
    Python is a popular, general-purpose programming language that can also be used for data science. Pair it with a library like pandas library and with a useful interface, and Python can help you create new insights and data visualizations.
    MATLAB helps crunch numbers
    MATLAB was built to focus on numerical computing. It is often used in higher education.
    Apache Spark supports big data and machine learning
    Apache Spark is a proprietary general-purpose framework that can be especially useful for extremely large data sets and the machine learning that uses them.

    Tuesday, July 14, 2020



    Data Cleaning and Blending

    The NYT dataset doesn’t include information about county population, so I’m going to merge the two datasets into one using Python and pd.merge().
    merge in Pandas
    by using a SQL JOIN or 
    a VLOOKUP in Excel

    Before we do that, we first need to clean the data 

    check the Github repo here.

    Monday, July 13, 2020

    Support Vector Machines

    Support Vector Machines, Clearly Explained!!!

    Plot an ROC Curve in Python

    How to Plot an ROC Curve in Python | Machine Learning in Python

    OC and AUC, Clearly Explained!

    ROC and AUC in R

    machine learning and data science on cloud

    run machine learning and data science on cloud using high processing GPUs at no COST

    Google Colaboratory supports Python version 2.7 and 3.6
    I see an example how to use Swift in Colab a while ago

    # Please try the newer version here:

    or Kaggle R jupyter notebook which supports R and Rstan by default:

    How to use R and Python in same notebook on Google Colab

    How to Build Your First Data Science Web App in Python (Streamlit Tutorial Part 1)

    Sunday, July 12, 2020


    I do have a project going on and wanted to check if you can help me with any of my challenges.

    Project: Comparative Analysis on stock returns around M&A announcements.
    I need an expert / consultant in Data Analysis and Econometrics with access to professional databases and knowledge possibly in Python, R, Stata or similar tools to conduct an event study with multiple event windows done on a large dataset.
    I have identified a list of ~17k transactions (the sample) that fulfil the selection criteria. I intend to conduct an event study in order to find abnormal returns for acquirer, target and both combined. The results shall then be presented.
    My challenges:
    - Reducing the sample? Yes or No?
    - Identifying the right indices or basket of comparable (industry, geography, liquid) stock as a proxy of the market portfolio to regress against.
    - Sourcing the data for the large amount of transactions (acquirer, target, market portfolio.
    - Cleaning the data and making sure that for each transaction there are an equal amount of observations.
    - Estimate normal returns based on respective market portfolio chosen for the specific event
    - Cumulative Abnormal returns for all Acquirers, Targets and both combined (Whole Sample)
    - Testing for significance
    - Dividing the data into two cohorts based on one simple selection criteria
    - Cumulative Abnormal returns for all Acquirers, Targets and both combined (Cohort 1 & Cohort 2)
    - Testing for significance
    • The candidate must have proven knowledge and understanding of conducting event studies, data anlysis and econometrics.
    • The candidate should have a good understanding of the academic literature surrounding event studies.
    • The candidate must have access to professional databases like Bloomberg, Datastream, CapitalIQ or comparable.
    - A solution that requires as little manual intervention as possible and can be reused with a different data set.
    - Support in word and deed and act as a consultant.
    - Model documentation and validation including relevant tables and graphs and descriptive statistics
    - Source code, if any
    Budget: 250 doller, Delivery time: 7 days (Jul. 16 2020)
    Tool: Python