EIASR (Image and speech recognition)

Winter 2016/2017


Meeting times and rooms

Monday, 10:15-12:00 a.m., room 6.

Wednesday (selected weeks), 14:15-16:00, room 108.

Wednesday (selected weeks), 14:15-16:00, room 108.

[go to top]

Teaching staff and contact info

Prof. Włodzimierz KASPRZAK (lecture, exercises)
Office: room 565, E&IT Faculty, Institute of Control and Computation Eng.
Office hours: Tuesday, 12.15-14 (i.e. 0.15 p.m. - 2 p.m.)
Phone: +22 234 7866
W.Kasprzak at elka.pw.edu.pl

Dr Paweł WAWRZYŃSKI (exercises, project)
Office: room 560, E&IT Faculty, Institute of Control and Computation Eng.
Office hours:
Phone: +22 234 7120

[go to top],

Short course description

Course objectives The goal is to learn about basic methods and algorithms in digital image- and speech-analysis. After completing this course students will be able to design image and speech analysis programs dealing with pattern (image or speech) processing, pattern segmentation and object (or word) recognition.

Students are expected to have the following background:

Course materials
Lecture notes will be posted periodically on the course web site. Selected chapters from the books below are recommended as optional reading.

lecture notes

[go to top]

Marks and Grading

Assessment will be marked out of a hundred. The marks equate to ECTS grades as given below:
ECTS Grade A, 5 B, 4.5 C, 4 D, 3.5 E, 3 F/FX, 2
mark 100- 91 90-81 80-71 70- 61 60- 51 50 or less
Students are collecting assessment points. They come from a continuous assessment in the semester time: The assessment method of this course consists of: In addition to satisfying the above assessment requirements, every student must satisfy the attendance requirements. There is an obligatory attendance of exercises and an optional attendance of the lecture. The Pass mark for this course will be set at 51 pts. Credits will be awarded to candidates who pass this course.

[go to top]


Place and time: Monday, time 10.15-12.00, room 6; or Wednesday, time 14.15-16, room 108.

Lecture schedule (tentative): Tests:
  1. [30.11, Wednesday] Part 1, 14.15-16.00, room 108
  2. [25.01, Wednesday] Part 2, 14.15-16.00, room 108
  3. [1.02.2017, Wednesday] Retake (both parts), 11.00-13.00, room 17

[go to top]

Course notes and readings

Lecture and Exercise notes:
  1. W. Kasprzak: Image and speech recognition.
    Lecture notes, Part I: Pattern recognition,
    Part II: Image recognition,
    Part III: Speech recognition.
    WUT, Warszawa, 2015, v.3, 11 chapters.

  2. W. Kasprzak: Image and speech recognition. Exercises, WUT, Warszawa, 2015, v.3.
  1. W. Kasprzak: (in Polish) Rozpoznawanie obrazów i sygnałów mowy. Oficyna wydawnicza Politechniki Warszawskiej, Warszawa, 2009.
  2. R.Duda, P.Hart, D.Stork: Pattern Classification. 2nd edition, John Wiley & Sons, New York, 2001. (Chapters: 2,3,4,10)
  3. R. C. Gonzales, Woods: Digital Image Processing 3rd Edition, Prentice Hall, 2008
  4. I. Pitas: Digital Image Processing Algorithms and Applications, Prentice Hall, New York etc. 2000. (Chapters: 2,3,5,6,7)
  5. L. R. Rabiner and R. W. Schafer, Introduction to Digital Speech Processing, Foundations and Trends in Signal Processing 2007 (Sections: 1-6, 9).
  6. W. Kasprzak: Adaptive computation methods in digital image sequence analysis. Prace Naukowe - Elektronika, Warsaw University of Technology Publishing House, Warszawa, No. 127 (2000), 172 pages. (Chapters: 3,4)
Other sources:
  1. J. Benesty, M.M. Sondhi, Y. Huang (eds): Handbook of Speech Processing. Springer, Berlin Heidelberg, 2008.
  2. The OpenCV Reference Manual. Release (or higher). 2014 (or later), http://opencv.org/
  3. Kaldi speech recognition project. http://kaldi-asr.org/

Suggested Readings
For each lecture section, one or more suggested readings are given below.
Week Topic Readings Lecture notes
(Week 1, 2) L1. Introduction. Pattern Recognition. [Kas09, ch.1] EIASR_1
(Week 2, 3) L4. Iconic processing
[Kas09, ch.3], [Pitas, 2] [Gonzalez, 2.5, 4.2, 7.3] EIASR_4
(Week 4) L2. Pattern transformation [Kas09, 2.1], [Gonzalez, 3.6], [Duda, 4.10-4.11], [Kas00, 3, 4] EIASR_2
(Week 5) L8. Speech pre-processing
[Kas09, ch. 7 and 8], [Rabiner, 3.1-3.2] EIASR_8
(Week 6) L5. Image segmentation I [Kas09, 4.1-4.2], [Pitas, 3], [Gonzalez, 4.3, 4.4, 7.1] EIASR_5
(Week 7) L6. Image segmentation II
[Kas09, 4.3-4.7], [Pitas, 5-7 ], [Gonzalez, 3.7, 7.2, 7.4, 8,1-8.3] EIASR_6
(Week 8) Test 1
(Week 9) L9. Acoustic speech features [Kas09, 9.1-9.2], [Rabiner, 3.3, 4.1-4.6] EIASR_9
(Week 10) Retake exercises
(Week 11, 12) L3. Pattern classificaton
[Kas09, 2.2-2.8, 9.3], [Duda, 2-3, 10] EIASR_3
(Week 13) L10. Phonetic speech model. (cancelled) [Kas09, ch.10], [Rabiner, 2.1-2.4 ] EIASR_10
(Week 13) L3.A Pattern sequence recognition
[Kas09, ch.5] EIASR_3A
(Week 13 cont.) L11. Word and sentence recognition
[Kas09, 5.1, ch.11], [Rabiner, 4.7, 6] EIASR_11
(Week 14) L7. 3-D object recognition
(Week 15) Test 2.
(Week 16) Retake test.

[go to top]


Place and time: Wednesday (selected weeks), time 14.15-16.00, room 108.

[go to top]

Project work:

Goal: The goal of each project work, dedicated for 1-2 persons, is to design a particular analysis system and to implement it as a program application in a programming language (C++, Java, Matlab, C# prefered). The analysis system performs an image or speech recognition task.

Place and time: Wednesday (selected weeks), 14.15-16.00, room 108, or Monday (selected weeks), 10.15-12.00, room 121.


  1. [5.10] - Project presentation and assignments
  2. [2.11] - Validation of assumptions
  3. [9.11] - Validation of assumptions, cont.
  4. [28.11, Monday!] - I. Preliminary report deadline
  5. [12.12, Monday!] - Work-in-progress
  6. [19.12, Monday!] - II. Prototype evaluation
  7. [23.01, Monday!] - III. Completed project deadline (final evaluation)
  8. [till 30.01, office hours] - III.a Delayed final evaluation
Marks: Participants can earn up to 40 points.

Suitable implementation tools - libraries with open sources:

  1. OpenCV - Open Source Computer Vision library - diverse image processing and analysis algorithms in C++.
    Download openCV
  2. DisCODe - Distributed Component Oriented Data Processing – a C++ framework facilitating the development of data (image, speech) processing algorithms (T.Kornuta and M.Stefańczyk at WUT).
  3. MARF - The Modular Audio Recognition Framework (written In JAVA).
  4. Sphinx-4 - A speech recognizer written in Java.
  5. The KALDI project page - provides a toolkit for speech recognition written in C++ .

W. Kasprzak.
Last modification: 26.12.2016.