Also known as Computer Vision (CV)
Lecture on Information Science
-
Definitions of “CV,” “Computer Graphics (CG),” and “Image Processing”
- Extracting “shape/appearance/movement/meaning” from images/videos is what CV is about
- Antonym of CV: Computer Graphics (CG)
- Creating “images/videos” from “shape/appearance/movement/meaning” is CG
- Opposite direction of CV
- Image Processing involves transforming “images/videos” into different “images/videos”
- Such as changing colors
-
(blu3mo) This above seems slightly different from the definition in the book “Digital Image Processing”
- The image posted on the Image Processing page seems to present a slightly different definition
-
The difficulty of CV lies in being hard for the general public to understand
- It’s difficult to convey the challenge of recognizing a plastic bottle as a plastic bottle right in front of you
- (blu3mo) There also seems to be a perspective of conveying this kind of understanding in Programming Education
- What computers can/cannot do
-
Topics
- Feature point detection and matching
- Shape reconstruction from motion (generating point cloud data from multiple images)
- Computational Photography
- 3D reconstruction
- Image recognition (such as YOLO)
- Recently, there is also a focus on Fairness (issues where AI exhibits racial bias)
-
Image Recognition
-
Combination: Machine Learning x Computer Vision x Natural Language Processing
-
Object detection, semantic segmentation, etc.
-
History
- SIFT: Local features using histograms of gradient magnitudes
- Bag of Visual Words: Applying the Bag of Words method from Natural Language Processing
- Image datasets: Caltech-101
-
Methods (basic ones)
- To improve,
- Use images (kernels) suitable for recognition rather than data from parts of images
- Use more complex representations than One-hot encoding
- Resembles the lineage of NLP
- Conceptually, compress the dimensions of images while expanding the depth dimension
- Hierarchical structure
- (blu3mo) Similar to the diagram in the CNN section
- To improve,
-
-
(blu3mo) As long as we work with datasets collected by humans, we adapt and overfit to human cognition
- Well, that’s the goal
- It might lead to philosophical discussions about whether image recognition relates to recognizing the “thing itself”
- It’s naturally impossible to recognize without human labeling
- What if we deliberately ignore human presence and work towards object detection?
- Would that be Unsupervised Learning, or can we achieve object detection (or something close) without human supervision?
- Labeling through natural language is obviously not possible
- Can we create an intelligent way of visual perception other than human?
- Can we reach a level that can be called intelligent?
- Or rather, if humans can’t understand it, can we call it intelligent?
- Would that be Unsupervised Learning, or can we achieve object detection (or something close) without human supervision?