11 minute read

mysql_docker_myphpadmin.png

Previously, the bicycle rides data from Bixi were web-sraped and stored locally. To store the data, we explore how to set up MySQL and phpMyAdmin using Docker containers without installing MySQL, phpMyAdmin, and their dependencies locally.

Motivation and Introduction

What are the tools and why?

MySQL

  • MySQL is an open-source relational database management system (DBMS). Although not the latest, it is one of the world’s most popular DBMS.

phpMyAdmin

  • Without a user interface, MySQL can only be interacted with through Terminal (or PowerShell and Command Prompt depending on OS). phpMyAdmin addresses this problem by being an open-source portable web application that acts as an administration tool for MySQL. An alternative would be MySQL Workbench, but that requires installation on local machine and defeats the purpose of using Docker.

Docker

  • Docker is a tool that packages software into containers independent of the local system. Docker was used to run MySQL and phpMyAdmin on the local computer without installation, and can be used to package the entire setup in the future. Read here for more details on Docker.

Why store the data in a database using DBMS?

Consolidation

  • The web-scraped data are stored across multiple Comma Separated Value (CSV) files, which makes it more difficult to access all the data since the data files must be loaded individually from each CSV and combined before use.

Tracking

  • With change being the only constant, no data is expected to be perfect or clean. Using a database with DBMS like MySQL allows for logging and tracking of updates to data.

Structure, Relationships and Normalization

  • On top of being a DBMS, MySQL is also a relational DBMS, meaning that different data tables can be connected through specified relationships between columns. For example, a bicycle docking station can store many bicycles. Furthermore, large data tables (many columns) can be broken down into smaller tables which are connected through relationships. This is known as normalization and the benefits include smaller data tables and data completeness.

Data Authority, Sharing and Security

  • Rather than sending the data files through email, other users can access the same consolidated data stored in a database. This reduces the risks of having multiple copies (truths) of data among users, which causes confusion and may lead to wasted time or even wrongly justified decision making. Finally, MySQL also provides a way to manage access levels to data among users.

Wait, isn’t Docker for Software Development?

The ability to containerize applications provided by Docker is essential in the field of data science for the following reasons:

Ease of Installation

  • Using Docker containers avoids the need to install MySQL and all its dependencies on your local computer. Instead of spending hours installing and configuring MySQL to work with phpMyAdmin over many computers, using Docker containers allows users to get MySQL running with phpMyAdmin in less than 5 minutes consistently across many computers.
  • Docker is also useful when setting up specific environments for machine learning to be shared among colleagues, and is even more so when bringing the trained models to production in the cloud.

Sharing and Scaling

  • In the world of big data, the data size is usually too large to store in any single computer. Docker containers allow users to transition their containerized DBMS and volumes onto the cloud, skipping the hassle of reconfiguring setup to match cloud specifications.

Operations and Modularization

  • Containerized applications can be easily updated or replaced with another container if the current one fails. This means that the entire operations process of DBMS deployment can be modularized into independent components. Individual components (containers) can be replaced without the need to rebuild all operations, reducing database downtime.

Overview of Setup

docker_network_setup.png

The image above represents an overview of the DBMS setup. The containers with red caps represent Docker containers running MySQL and phpMyAdmin Docker images. Both the MySQL and phpMyAdmin containers can communicate with each other because they exist in a Docker network, represented by the cloud. The actual data is stored outside the cloud in a data warehouse, which in this case is just the local computer. Note that in the data warehouse, data is stored as Docker volumes, which can be pushed and scaled on cloud services like AWS. The cleaned raw data is ingested through MySQL and stored in the data warehouse. Finally, dashboards and machine learning can access the data through MySQL in the Docker network.

Requirements

To run the above setup, you will need:

  • Docker installed on your local computer
  • Terminal to access Docker
  • a web browser, Google Chrome was used in this article
  • (optional) a code editor for writing Docker-compose script, VSCodium was used

Notes on Image Version

For this example, MySQL was pulled using the version 8.0 tag on Docker Hub, a public repository for Docker images. Docker images are used to build Docker containers. Similarly, phpMyAdmin was also pulled from Docker Hub, with version 5.2 specified.

Note, Docker images pulled from Docker Hub become Docker containers when they are run on Docker Engine, all of which runs on the local machine. (Read more here) What this implies practically is that Docker containers are still dependent on the CPU of the local host, meaning that some Docker images which were built to run on Intel chips may not work on the newer ARM chips like Apple’s M1 and M2. However, many Docker images are being or have already been updated to run on ARM chips at the time of this article. This is usually specified on the Docker Hub page of the desired image.

Deployment of DBMS Setup using Docker-compose

The first step is to create a Docker-compose script that tells Docker what containers to setup. Once defined, Docker-compose automatically handles:

  • the pulling of Docker images from Docker Hub
  • the creation of a Docker network
  • the creation of Docker containers from Docker images
  • the connection between Docker containers within the network
  • the connection between the Docker containers to volumes outside the network.

The Docker-compose script is written as a YAML file, which is a human readable data serialization language. A GitHub repo for the script can be found here

The script is named mysql.yaml, and is included below with explanations for each line of code.

# Specify version of docker-compose for backwards compatability
version: '3.8'

# Define services (docker containers) to run
services:
  
  # Define a container named mysql to run MySQL DBMS
  mysql:

    # Specify image tag to reference, refer to Docker Hub https://hub.docker.com/_/mysql
    image: mysql:8.0

    # Tells Docker to restart container if fail
    restart: always

    # Define environment variables, different for each image
    environment:
      # rootroot was used in this case as an example
      MYSQL_ROOT_PASSWORD: rootroot

    # Define ports for communication between host (left) and container(right)
    ports:
      - '3306:3306'

    # Define volumne to write data to for data persistence when container restarts
    # Local host directory (left) : directory in container (right)
    volumes:
      - mysql_db_data:/var/lib/mysql

  # Define a container to run phpMyAdmin
  phpmyadmin:

    # Specify phpmyadmin to run after mysql has started
    depends_on:
      - mysql

    # Specify image tag to reference, refer to Docker Hub https://hub.docker.com/_/phpmyadmin
    image: phpmyadmin:5.2

    # Tells Docker to restart container if fail
    restart: always

    # Define ports for communication between host (left) and container(right)
    ports:
      - '8080:80'

    # Define link from mysql container to phpmyadmin container
    links:
      - mysql:db

# Define Docker volumes to store data
volumes:

  # Specify volume name
  mysql_db_data:

    # Tells docker that volume is stored on local computer
    driver: local

Now that the Docker-compose script is defined, the beauty of using Docker is that the entire MySQL and phpMyAdmin setup can be executed in 1 line of code as shown below:

# Run below line in terminal, make sure you are in same directory as YAML file
docker-compose -f mysql.yaml up

Screenshot%202023-06-21%20at%2012.21.38.png

Once the Docker network and containers are up and running, type the following path into a web browser of your choice to access phpMyAdmin.

# Note, the port 8080 was specified in the YAML file and can be changed accordingly in case of port conflict
http://localhost:8080/

Screenshot%202023-06-21%20at%2012.25.10.png

Within phpMyAdmin, the usual database operations like creating tables and defining primary keys can be performed.

Finally, when done, the entire setup can be shut down using the down command, which automatically shuts down and removes the Docker containers and networks. The databases and data are stored in the specified Docker volume in the YAML file, and is persistent when the Docker-compose YAML file is initiated again.

# Run below line in terminal, make sure you are in same directory as YAML file
docker-compose -f mysql.yaml down

Conclusion and Next Steps

In this article, we explored how Docker containers can be used to conveniently setup MySQL and phpMyAdmin on a local computer without the need to deal with installation and dependencies. For now, the data is stored on the local computer. But, in the future when more data is acquired, the entire Docker setup can be packaged and hosted on cloud services like AWS, enabling scalability. In future articles, the Bixi rides data will be loaded into the setup database for further cleaning and use in analysis, visualizations and machine learning models.

Appendix

Here we will explore the step by step progression of building the docker-compose YAML file starting from basic commands.

A01 Install Docker and Ensure it is running

Docker can be installed using instructions here.

Then, Docker can be verified to be running through below methods:

1) Visible Desktop icon of Docker:

Screenshot%202023-06-19%20at%205.40.22%20PM.png

2) Using terminal:

# Input below to verify both Docker client and server(Engine) are running
docker version

Screenshot%202023-06-19%20at%206.01.50%20PM.png

A02 Pull Images from Docker Hub

A benefit of Docker is that the large community base and companies have already constructed and uploaded ready-to-use Docker images on Docker Hub. Each image may contain instructions specific to the image. The images used in this article are:

The images can be pulled using the Docker pull command:

# If unspecified, the latest version of the image will be pulled
# Specify version after the ':' symbol
docker pull mysql:8.0
docker pull phpmyadmin:5.2

Screenshot%202023-06-19%20at%206.04.08%20PM.png

To check if images are pulled, the ls command can be chained with the image command:

# Check pulled (downloaded) images
docker image ls

Screenshot%202023-06-19%20at%206.05.21%20PM.png

A03 Creating a Network to run Containers

A common network is needed to enable the MySQL and phpMyAdmin containers to communicate effectively with each other. To this end, the create command can be chained with the network command:

# input the name of the network after `create`
docker network create mysql_network

# similar to images, use `ls` to check created networks
docker network ls

Screenshot%202023-06-19%20at%206.13.46%20PM.png

Along with the created network mysql_network, the other 3 networks are default Docker networks that should be left running.

A04 Create Docker Containers from Images

To create a Docker container within the network, the attribute --net may be specified in the run command. Note that the run command automatically pulls an image from Docker Hub if no local images were found.

# Docker run command to create a container in the network

# run in detached mode, meaning that the terminal is left free to use, requires manual shut down
docker run -d \ 

# the name of the container, specified as 'mysql'
--name mysql \

# environment variables, specific to each image
-e MYSQL_ROOT_PASSWORD=rootroot \

# Docker network in which container should run
--net mysql_network

# The image to build a container from
mysql:8.0

Screenshot%202023-06-20%20at%2015.48.24.png

Unlike images and networks, to view docker containers, the ps command is used:

# to show active containers
docker ps

# to show all containers, including inactive ones
docker ps -a

Screenshot%202023-06-20%20at%2015.49.08.png

To further verify if MySQL is running in the container, the exec command can be combined with -it to start an interactive terminal running within the container. Then, we can verify if MySQL is running by typing mysql -V:

# To open a terminal inside the container
# -it stands for interactive
# /bin/bash opens a bash terminal
docker exec -it mysql /bin/bash

# Check version of MySQL in container
mysql -V

Screenshot%202023-06-19%20at%206.22.54%20PM.png

A similar step can be performed for the phpMyAdmin container with the addition of two arguments, --link and -p:

# Docker run coemmand to initialize phpmyAdmin container
docker rund -d \      # run in detached mode
--name myphpadmin \   # define container name to be phpmyadmin

# specify the link between the previously created 'mysql' container to the 'db' port within phpmyadmin
--link mysql:db \ 
--net mysql_network \ # specify docker network

# specify port on local host (8080) to port in container (80)
-p 8080:80 \
phpmyadmin:5.2        # image to build container from

Screenshot%202023-06-20%20at%2015.52.39.png

A05 Use phpMyAdmin to Access MySQL

Previously we defined port 8080 as the entry point in the local machine to access port 80 in the phpMyAdmin container. Thus, inputing localhost:8080 into a web browser will bring up the phpMyAdmin page, where the username is root and the password was defined as rootroot.

Screenshot%202023-06-20%20at%2015.53.55.png

After logging in, phpMyAdmin can be used to administer MySQL.

Screenshot%202023-06-20%20at%2015.55.57.png

A06 Packaging Commands into Docker Compose

The details behind each line of code within the Docker-compose YAML file was explained above and in this GitHub repository. Two notable differences:

  • the network is automatically defined using Docker-compose
  • specifications for persistent data storage in a Docker volume are specified in the YAML file

docker_compose_comparison.png

References and Acknowledgements