EIASR (Image and speech recognition)

Winter 2017/2018


Meeting times and rooms

Monday, 10:15-12:00 a.m., room 121.

Wednesday (selected weeks), 14:15-16:00, room 108.

Wednesday (selected weeks), 14:15-16:00, room 108.

[go to top]

Teaching staff and contact info

Prof. Włodzimierz KASPRZAK (lecture, exercises)
Office: room 565, E&IT Faculty, Institute of Control and Computation Eng.
Office hours: Monday & Friday, time 14.00-15.00
Phone: +22 234 7866
W.Kasprzak at elka.pw.edu.pl

Maciej Stefańczyk, M.Sc. (exercises, project)
Office: room 564, E&IT Faculty, Institute of Control and Computation Eng.
Office hours:
Phone: +22 234 xxx
M.Stefanczyk at elka.pw.edu.pl

[go to top],

Short course description

Course objectives The goal is to learn about basic methods and algorithms in digital image- and speech-analysis. After completing this course students will be able to design image and speech recognition programs, dealing with pattern (image or speech) processing, pattern segmentation and object (or word) recognition.

Students are expected to have the following background:

Course materials
Lecture notes will be posted periodically on the course web site. Selected chapters from the books below are recommended as optional reading.

lecture notes

[go to top]

Marks and Grading

Assessment will be marked out of a hundred. The marks equate to ECTS grades as given below:
ECTS Grade A, 5 B, 4.5 C, 4 D, 3.5 E, 3 F/FX, 2
mark 100- 91 90-81 80-71 70- 61 60- 51 50 or less
Students are collecting assessment points. They come from a continuous assessment in the semester time: The assessment method of this course consists of: The Pass mark for this course will be set at: 25 pts. for combined assessment of exercises + project, and 26 pts. for total assessment of tests. In addition to satisfying the above assessment requirements, every student must satisfy the attendance requirements. There is an obligatory attendance of exercises/project and an optional attendance of the lecture. Credits will be awarded to candidates who pass this course.

[go to top]


Place and time: Monday, time 10.15-12.00, room 6.

Lecture schedule (tentative): Tests:
  1. [4.12] Part 1, 14.15-16.00, room 108
  2. [22.01] Part 2, 14.15-16.00, room 108

[go to top]

Course notes and readings

Lecture and Exercise notes:
  1. W. Kasprzak: Image and speech recognition. Lecture notes, WUT, Warszawa, 2017, v.5, 12 chapters.

  2. W. Kasprzak: Image and speech recognition. Exercises, WUT, Warszawa, 2017, v.5.
  1. W. Kasprzak: (in Polish) Rozpoznawanie obrazów i sygnałów mowy. Oficyna wydawnicza Politechniki Warszawskiej, Warszawa, 2009.
  2. R.Duda, P.Hart, D.Stork: Pattern Classification. 2nd edition, John Wiley & Sons, New York, 2001. (Chapters: 2,3,4,10)
  3. R. C. Gonzales, Woods: Digital Image Processing 3rd Edition, Prentice Hall, 2008
  4. I. Pitas: Digital Image Processing Algorithms and Applications, Prentice Hall, New York etc. 2000. (Chapters: 2,3,5,6,7)
  5. L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing 2007 (Sections: 1-6, 9).
  6. J. Benesty, M.M. Sondhi, Y. Huang (eds): Handbook of Speech Processing. Springer, Berlin Heidelberg, 2008.
  7. W. Kasprzak: Adaptive computation methods in digital image sequence analysis. Prace Naukowe - Elektronika, Warsaw University of Technology Publishing House, Warszawa, No. 127 (2000), 172 pages. (Chapters: 3,4)
Other sources:
  1. The OpenCV Reference Manual. Release (or higher). 2014 (or later), http://opencv.org/
  2. Kaldi speech recognition project. http://kaldi-asr.org/

Suggested Readings
For each lecture section, one or more suggested readings are given below.
Week Topic Readings Lecture notes
(Week 1, 2) L1. Introduction - a pattern recognition system. IASR_1
(Week 2) L5. Image processing
[Kas09, ch.3], [Pitas, 2] [Gonzalez, 2.5, 4.2, 7.3] IASR_5
(Week 3, 4) L2. Pattern transformation [Kas09, 2.1], [Gonzalez, 3.6], [Duda, 4.10-4.11], [Kas00, 3, 4] IASR_2
(Week 5) L9. Speech signal
[Kas09, ch. 7 and 8], [Kas09, ch. 10], [Rabiner, 2.1-2.4], [Rabiner, 3.1-3.2] IASR_9
(Week 6) L10. Speech features [Kas09, 9.1-9.2], [Rabiner, 3.3, 4.1-4.6] IASR_10
(Week 7) L6. Image segmentation I [Kas09, 4.1-4.2], [Pitas, 3], [Gonzalez, 4.3, 4.4, 7.1] IASR_6
(Week 8) L7. Image segmentation II
[Kas09, 4.3-4.7], [Pitas, 5-7 ], [Gonzalez, 3.7, 7.2, 7.4, 8,1-8.3] IASR_7
(Week 9) Test 1
(Week 10) L3. Pattern classificaton
[Kas09, 2.2-2.8, 9.3], [Duda, 2-3, 10] IASR_3
(Week 11, 12) L4. Pattern sequences
[Kas09, ch.5], [Rabiner, 4.7] IASR_4
(Week 13) L11. Word and sentence recognition
[Kas09, 11], [Rabiner, 6] IASR_11
(Week 14, 15) L8. Object recognition
(Week 16) Test 2.

[go to top]


Place and time: Wednesday (selected weeks), time 14.15-16.00, room 108. Marks: Participants can earn up to 10 points. Points will be deducted in case of absence (-1p. for 1h).

[go to top]

Project work:

Goal: The goal of each project work, dedicated for 1-2 persons, is to design a particular analysis system and to implement it as a program application in a programming language (C++, Java, Matlab, C# prefered). The analysis system performs an image or speech recognition task.

Place and time: Wednesday (selected weeks), 14.15-16.00, room 108.


  1. [4.10] - Lecture
  2. [18.10] - Project introduction and topics
  3. [15.11] - Project assignments
  4. [29.11] - Validation of assumptions
  5. [13.12] - I. Preliminary report deadline
  6. [3.01] - II. Prototype
  7. [24.01] - III. Completed work (final evaluation)
  8. [till 26.01] - Delayed final evaluation
Marks: Participants can earn up to 30 points.

Suitable implementation tools - libraries with open sources:

  1. OpenCV - Open Source Computer Vision library - diverse image processing and analysis algorithms in C++.
    Download openCV
  2. DisCODe - Distributed Component Oriented Data Processing – a C++ framework facilitating the development of data (image, speech) processing algorithms (T.Kornuta and M.Stefańczyk at WUT).
  3. MARF - The Modular Audio Recognition Framework (written In JAVA).
  4. Sphinx-4 - A speech recognizer written in Java.
  5. The KALDI project page - provides a toolkit for speech recognition written in C++ .

W. Kasprzak.
Last modification: 23.09.2017.