Tiered representations for audio-based multimedia and speech retrieval