Also known as Computer Vision (CV)

Lecture on Information Science

  • Definitions of “CV,” “Computer Graphics (CG),” and “Image Processing”

    • Extracting “shape/appearance/movement/meaning” from images/videos is what CV is about
    • Antonym of CV: Computer Graphics (CG)
      • Creating “images/videos” from “shape/appearance/movement/meaning” is CG
      • Opposite direction of CV
    • Image Processing involves transforming “images/videos” into different “images/videos”
      • Such as changing colors
  • (blu3mo) This above seems slightly different from the definition in the book “Digital Image Processing”

    • The image posted on the Image Processing page seems to present a slightly different definition
  • The difficulty of CV lies in being hard for the general public to understand

    • It’s difficult to convey the challenge of recognizing a plastic bottle as a plastic bottle right in front of you
    • (blu3mo) There also seems to be a perspective of conveying this kind of understanding in Programming Education
      • What computers can/cannot do
  • Topics

    • Feature point detection and matching
    • Shape reconstruction from motion (generating point cloud data from multiple images)
    • Computational Photography
    • 3D reconstruction
    • Image recognition (such as YOLO)
    • Recently, there is also a focus on Fairness (issues where AI exhibits racial bias)
  • Image Recognition

    • Combination: Machine Learning x Computer Vision x Natural Language Processing

    • Object detection, semantic segmentation, etc.

    • History

      • SIFT: Local features using histograms of gradient magnitudes
      • Bag of Visual Words: Applying the Bag of Words method from Natural Language Processing
      • Image datasets: Caltech-101
    • Methods (basic ones)

      • To improve,
        • Use images (kernels) suitable for recognition rather than data from parts of images
        • Use more complex representations than One-hot encoding
          • Resembles the lineage of NLP
      • Conceptually, compress the dimensions of images while expanding the depth dimension
        • Hierarchical structure
        • (blu3mo) Similar to the diagram in the CNN section
  • (blu3mo) As long as we work with datasets collected by humans, we adapt and overfit to human cognition

    • Well, that’s the goal
    • It might lead to philosophical discussions about whether image recognition relates to recognizing the “thing itself”
      • It’s naturally impossible to recognize without human labeling
      • What if we deliberately ignore human presence and work towards object detection?
        • Would that be Unsupervised Learning, or can we achieve object detection (or something close) without human supervision?
          • Labeling through natural language is obviously not possible
          • Can we create an intelligent way of visual perception other than human?
            • Can we reach a level that can be called intelligent?
            • Or rather, if humans can’t understand it, can we call it intelligent?

#informationscience