The IVR server would intercept VOIP calls and handle them. From what I can tell, there would be no need for hardware since it will all be done internally. The server could contain the code to convert the VOIP signal, if that is even necessary.
I have been doing searches on this but I thought I would throw it out here as well.

1 comment… read it below or add one
Try http://www.asterisk.org/
and for open voice recognition try
http://www.voicexml.org/
http://en.wikipedia.org/wiki/Speech_recognition
which will point to links to Open Source like,
http://cmusphinx.sourceforge.net/html/cmusphinx.php