EIASR (Image and speech recognition)

M.Sc. course. Summer 2018.


Meeting times and rooms

Monday, 8:15-10:00 a.m., room 117.

Exercises / Project:
Wednesday, 14:15-16:00, room 170.

[go to top]

Teaching staff and contact info

Prof. Włodzimierz KASPRZAK (lecture)
Office: room 565, E&IT Faculty, Institute of Control and Computation Eng.
Office hours: Monday, time 10.15-12.00
Phone: +22 234 7866
W.Kasprzak at elka.pw.edu.pl

Maciej Stefańczyk, M.Sc. (exercises, project)
Office: room 564, E&IT Faculty, Institute of Control and Computation Eng.
Office hours:
Phone: +22 234 xxx
M.Stefanczyk at elka.pw.edu.pl

[go to top],

Short course description

Course objectives The goal is to learn about basic methods and algorithms in digital image- and speech-analysis. After completing this course students will be able to design image and speech recognition programs, dealing with pattern (image or speech) processing, pattern segmentation and visal object- or speech recognition.

Students are expected to have the following background:

Course materials
Lecture notes will be posted periodically on the course web site. Selected chapters from the books below are recommended as optional reading.

lecture notes

[go to top]

Marks and Grading

Assessment will be marked out of a hundred. The marks equate to ECTS grades as given below:
ECTS Grade A, 5 B, 4.5 C, 4 D, 3.5 E, 3 F/FX, 2
mark 100- 91 90-81 80-71 70- 61 60- 51 50 or less
Students are collecting assessment points. They come from a continuous assessment in the semester time: The assessment method of this course consists of: The Pass mark for this course will be set at: 25 pts. for combined assessment of exercises + project, and 26 pts. for total assessment of tests. In addition to satisfying the above assessment requirements, every student must satisfy the attendance requirements. There is an obligatory attendance of exercises/project and an optional attendance of the lecture. Credits will be awarded to candidates who pass this course.

[go to top]


Place and time: Monday, time 8.15-10.00, room 117.

Lecture schedule (tentative): Tests:
  1. [23.04] Part 1, 8.15-10.00, room 117
  2. [11.06] Part 2, 8.15-10.00, room 117
  3. [xx.06] Retake test (1+2), xxx, room xxx

[go to top]

Course notes and readings

Lecture and Exercise notes:
  1. W. Kasprzak: Image and speech recognition. Lecture notes, WUT, Warszawa, 2018, v.6, 13 chapters.

  2. W. Kasprzak: Image and speech recognition. Exercises, WUT, Warszawa, 2018, v.6.
  1. W. Kasprzak: (in Polish) Rozpoznawanie obrazów i sygnałów mowy. Oficyna wydawnicza Politechniki Warszawskiej, Warszawa, 2009.
  2. R.Duda, P.Hart, D.Stork: Pattern Classification. 2nd edition, John Wiley & Sons, New York, 2001. (Chapters: 2,3,4,10)
  3. R. C. Gonzales, Woods: Digital Image Processing 3rd Edition, Prentice Hall, 2008
  4. I. Pitas: Digital Image Processing Algorithms and Applications, Prentice Hall, New York etc. 2000. (Chapters: 2,3,5,6,7)
  5. L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing 2007 (Sections: 1-6, 9).
  6. J. Benesty, M.M. Sondhi, Y. Huang (eds): Handbook of Speech Processing. Springer, Berlin Heidelberg, 2008.
Other sources:
  1. The OpenCV Reference Manual. Release (or higher). 2014 (or later), http://opencv.org/
  2. Kaldi speech recognition project. http://kaldi-asr.org/

Suggested Readings
For each lecture section, one or more suggested readings are given below.
Week Topic Readings Lecture notes
(Week 1) L1. Introduction - a pattern recognition system. IASR-1
(Week 2) L6. Image processing
[Kas09, ch.3], [Pitas, 2] [Gonzalez, 2.5, 4.2, 7.3] IASR-6
(Week 3, 4) L2. Pattern transformation I. L3. Pattern transformation II. [Kas09, 2.1], [Gonzalez, 3.6], [Duda, 4.10-4.11], [Kas00, 3, 4] IASR-2, IASR-3
(Week 5) L10. Speech signal and -phonetics
[Kas09, ch. 7 and 8], [Kas09, ch. 10], [Rabiner, 2.1-2.4], [Rabiner, 3.1-3.2] IASR-10
(Week 6) L11. Speech features [Kas09, 9.1-9.2], [Rabiner, 3.3, 4.1-4.6] IASR-11
(Week 7) L7. Image segmentation I [Kas09, 4.1-4.2], [Pitas, 3], [Gonzalez, 4.3, 4.4, 7.1] IASR-7
(Week 8) L8. Image segmentation II
[Kas09, 4.3-4.7], [Pitas, 5-7 ], [Gonzalez, 3.7, 7.2, 7.4, 8,1-8.3] IASR-8
(Week 9) Test 1
(Week 10, 11) L4. Pattern classificaton
[Kas09, 2.2-2.8, 9.3], [Duda, 2-3, 10] IASR-4
(Week 12) L5. Pattern sequences
[Kas09, ch.5], [Rabiner, 4.7] IASR-5
(Week 13) L12. Speech recognition. L9. Object recognition.
[Kas09, 11], [Rabiner, 6] IASR-12 , IASR-9
(Week 14) L13. Speaker recognition
(Week 15) Test 2.

[go to top]


Place and time: Wednesday (selected weeks), time 14.15-16.00, room 170. Marks: Participants can earn up to 10 points. Points will be deducted in case of absence (-1p. for 1h).

[go to top]

Project work:

Goal: The goal of each project work, dedicated for 1-2 persons, is to design a particular analysis system and to implement it as a program application in a programming language (C++, Java, Matlab, C# prefered). The analysis system performs an image or speech recognition task.

Place and time: Wednesday (selected weeks), 14.15-16.00, room 170.


  1. [7.03] - Project topics
  2. [21.03] - Project assignments
  3. [4.04] - Validation of assumptions
  4. [11.04] - Validation of assumptions
  5. [25.04] - Prototype
  6. [13.06] - Completed work (final evaluation)
  7. [till 18.06] - Late final evaluation
Marks: Participants can earn up to 30 points.

Suitable implementation tools - libraries with open sources:

  1. OpenCV - Open Source Computer Vision library - diverse image processing and analysis algorithms in C++.
    Download openCV
  2. DisCODe - Distributed Component Oriented Data Processing – a C++ framework facilitating the development of data (image, speech) processing algorithms (T.Kornuta and M.Stefańczyk at WUT).
  3. MARF - The Modular Audio Recognition Framework (written In JAVA).
  4. Sphinx-4 - A speech recognizer written in Java.
  5. The KALDI project page - provides a toolkit for speech recognition written in C++ .

W. Kasprzak.
Last modification: 14.02.2018.