EIASR (Image and speech recognition)

M.Sc. course. Winter 2018/19.

Contents:


Meeting times and rooms

Lecture:
Monday, 10:15-12:00 a.m., room 121.

Exercises / Project:
Wednesday, 14:15-16:00, room 108.

[go to top]


Teaching staff and contact info

Prof. Włodzimierz KASPRZAK (lecture)
Office: room 565, E&IT Faculty, Institute of Control and Computation Eng.
Office hours: Tuesday, time 12.15-14.00
Phone: +22 234 7866
W.Kasprzak at elka.pw.edu.pl

Dr Artur Wilkowski, M.Sc. Maciej Stefańczyk (exercises, project)
Office: room 564, E&IT Faculty, Institute of Control and Computation Eng.
Office hours:
Phone: +22 234 xxx
A.Wilkowski(at)pw.edu.pl, M.Stefanczyk(at)elka.pw.edu.pl

[go to top],


Short course description

Course objectives The goal is to learn about basic methods and algorithms in digital image- and speech-analysis. After completing this course students will be able to design image and speech recognition programs, dealing with pattern (image or speech) processing, pattern segmentation and visal object- or speech recognition.

Prerequisities
Students are expected to have the following background:

Course materials
Lecture notes will be posted periodically on the course web site. Selected chapters from the books below are recommended as optional reading.

lecture notes

[go to top]


Marks and Grading

Assessment will be marked out of a hundred. The marks equate to ECTS grades as given below:
ECTS Grade A, 5 B, 4.5 C, 4 D, 3.5 E, 3 F/FX, 2
mark 100- 91 90-81 80-71 70- 61 60- 51 50 or less
Students are collecting assessment points. They come from a continuous assessment in the semester time: The assessment method of this course consists of: The Pass mark for this course will be set at: 25 pts. for combined assessment of exercises + project, and 26 pts. for total assessment of tests. In addition to satisfying the above assessment requirements, every student must satisfy the attendance requirements. There is an obligatory attendance of exercises and project work and an optional attendance of the lecture. Credits will be awarded to candidates who pass this course.

[go to top]


Lecture

Place and time: Monday, time 10.15-12.00, room 121.

Lecture schedule (tentative): Tests:
  1. [26.11] Part 1, 10.15-12.00, room 121
  2. [14.01] Part 2, 10.15-12.00, room 121
  3. [21.01] Retake test (1+2), 10.15-12, room 121

[go to top]


Course notes and readings

Lecture and Exercise notes:
  1. W. Kasprzak: Image and speech recognition. Lecture notes, WUT, Warszawa, 2018, v.6, 12 chapters.

  2. W. Kasprzak: Image and speech recognition. Exercises, WUT, Warszawa, 2018, v.6.
Readings:
  1. W. Kasprzak: (in Polish) Rozpoznawanie obrazów i sygnałów mowy. Oficyna wydawnicza Politechniki Warszawskiej, Warszawa, 2009.
  2. R.Duda, P.Hart, D.Stork: Pattern Classification. 2nd edition, John Wiley & Sons, New York, 2001. (Chapters: 2,3,4,10)
  3. R. C. Gonzales, Woods: Digital Image Processing 3rd Edition, Prentice Hall, 2008
  4. I. Pitas: Digital Image Processing Algorithms and Applications, Prentice Hall, New York etc. 2000. (Chapters: 2,3,5,6,7)
  5. L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing 2007 (Sections: 1-6, 9).
  6. J. Benesty, M.M. Sondhi, Y. Huang (eds): Handbook of Speech Processing. Springer, Berlin Heidelberg, 2008.
Other sources:
  1. The OpenCV Reference Manual. Release 2.4.9.0 (or higher). 2014 (or later), http://opencv.org/
  2. Kaldi speech recognition project. http://kaldi-asr.org/

Suggested Readings
For each lecture section, one or more suggested readings are given below.
Week Topic Readings Lecture notes
(Week 1) L1. Introduction - a pattern recognition system. IASR-1
(Week 2) L6. Image processing
[Kas09, ch.3], [Pitas, 2] [Gonzalez, 2.5, 4.2, 7.3] IASR-6
(Week 3, 4) L2. Pattern transformation I. L3. Pattern transformation II. [Kas09, 2.1], [Gonzalez, 3.6], [Duda, 4.10-4.11], [Kas00, 3, 4] IASR-2, IASR-3
(Week 5) L10. Speech signal and -phonetics
[Kas09, ch. 7 and 8], [Kas09, ch. 10], [Rabiner, 2.1-2.4], [Rabiner, 3.1-3.2] IASR-10
(Week 6) L11. Speech features [Kas09, 9.1-9.2], [Rabiner, 3.3, 4.1-4.6] IASR-11
(Week 7) L7. Image segmentation I [Kas09, 4.1-4.2], [Pitas, 3], [Gonzalez, 4.3, 4.4, 7.1] IASR-7
(Week 8) L8. Image segmentation II
[Kas09, 4.3-4.7], [Pitas, 5-7 ], [Gonzalez, 3.7, 7.2, 7.4, 8,1-8.3] IASR-8
(Week 9) Test 1
(Week 10, 11) L4. Pattern classificaton
[Kas09, 2.2-2.8, 9.3], [Duda, 2-3, 10] IASR-4
(Week 12) L5. Pattern sequences
[Kas09, ch.5], [Rabiner, 4.7] IASR-5
(Week 13) L12. Speech recognition.
[Kas09, 11], [Rabiner, 6] IASR-12 ,
(Week 14) L9. Object recognition.
IASR-9
(Week 15) Test 2.

[go to top]


Exercises:

Place and time: Wednesday (selected weeks), time 14.15-16.00, room 108. Marks: Participants can earn up to 8 points. Points will be deducted in case of absence (-1p. for 1h). In case of more than 8h absence, the exercises and the entire course will not be passed.

[go to top]


Project work:

Goal: The goal of each project work, dedicated for 1-2 persons, is to design a particular analysis system and to implement it as a program application in a programming language (C++, Java, Matlab, C# prefered). The analysis system performs an image or speech recognition task.

Place and time: Wednesday (selected weeks), 14.15-16.00, room 108.

Schedule:

  1. - Project topics
  2. - Project assignments
  3. - Validation of assumptions
  4. - Validation of assumptions
  5. - Prototype
  6. - Completed work (final evaluation)
  7. - Late final evaluation
Marks: Participants can earn up to 32 points.

Suitable implementation tools - libraries with open sources:

  1. OpenCV - Open Source Computer Vision library - diverse image processing and analysis algorithms in C++.
    Documentation.
    Download openCV
  2. DisCODe - Distributed Component Oriented Data Processing – a C++ framework facilitating the development of data (image, speech) processing algorithms (T.Kornuta and M.Stefańczyk at WUT).
  3. MARF - The Modular Audio Recognition Framework (written In JAVA).
  4. Sphinx-4 - A speech recognizer written in Java.
  5. The KALDI project page - provides a toolkit for speech recognition written in C++ .

W. Kasprzak.
Last modification: 3.10.2018.