Thursday, January 12, 2012

Scikit-Signal - Python for Signal Processing - Developer Talks

In this post I intend to summarize the discussion that the SciPy Developers' community has been having about the idea of developing Scikit-Signal - a Python toolbox for advanced signal processing tools. The suggestion has received considerable interest on the mailing lists (it can be found here in the archives), and has triggered off a few more threads, with discussions ranging from what such a scikit should include to how it can contribute to the SciPy ecosystem. These discussions have also been followed by debates about the very future of SciPy.

Of course, not all of that is relevant to the purpose at hand - which is to identify the scope of the scikit-signal project, with due regard to the original signal processing abilities of SciPy. Therefore the aim of this post is to consolidate the different viewpoints and suggestions that we have encountered in the threads. It would be quite premature and presumptuous to attempt to define the scope of the project at this time. We do not know what it can grow to become. So let this post serve only as a concise record of what we all have been discussing. This will hopefully allow us to streamline discussions in the future. (Like the minutes of a meeting, if you will.)

What's in a name?

Travis Oliphant wrote to me asking why I don't do this work under scipy.signal. I told him that what I really had in mind was a twofold aim: improving the signal processing routines and documentation already present in SciPy, and scikit-signal could be a dedicated, advanced signal processing package, much like the other scikits. I said that I, alongwith a lot of other people, have some signal processing ideas on which to write Python code. I will let the community decide later on which namespace the code should reside in. I'm happy with any namespace because I, for one, do not understand the consequences a different namespace will have in the long run. If I can perform well, it might turn out that this particular scikit is doing a better job researching and developing signal processing tools, than it would if we were developing in SciPy. But as Gael said, this should not preclude improvement of the existing documentation and code in scipy.signal.

This project is whatever we, the developers, want it to be. If we want to maintain it as a separate scikit, so be it. However, if it becomes indispensable later on and we want to merge it with SciPy, so be it. That would be a true mark of its maturity. No matter what, SciPy as a central toolbox will remain indispensable, as Josef Perktold said in the discussion. So, for now, let us consider the namespace issue settled, or let's just put it on the backburner and proceed with the coding.

The Status Quo

The general consensus seems to be that the scipy.signal isn't in a good shape. It is also limited to filter design and basic linear system analysis. Here's what some people on the list suggested (about signal processing in Python generally, not just scipy.signal).
  1. Charles R Harris suggests that filter design needs improvement.
  2. Zachary Pincus suggests that scipy.interpolate needs improvement because it overlaps with a lot of other packages. (I myself +1 this, we really need better interpolation schemes. However, I think most of it is linear algebra and not signal processing, so I think it's an independent project.) Travis Oliphant said he is looking for someone who is working on and willing to coordinate future development of scipy.interpolate.
  3. Josef Perktold suggested that we include periodograms and the Levinson-Durbin algorithm (many more have mentioned this one). He also mentioned that there is a control system toolbox that overlaps considerably with the scikit-signal idea. If so, let us track them down and take a look at their code.
  4. Scipy.wavelets could do with better documentation and examples of plotting.
  5. David Cournapeau says that we are missing some simple linear code, and scipy.signal itself needs a lot of refactoring (He also stresses periodograms and the Levinson-Durbin algorithm).
These are some of the most stressed issues with signal processing in Python that the discussion has highlighted. There might be many more, I might have possibly left out a few crucial issues too. I so, please point them out.

Proceeding With the Scikit

The most contentious issue that came up in the discussion is why we need 'yet another' signal processing package. I won't defend the idea for a new scikit for signal processing here, that has already been done in the list. However, I believe that the concerns voiced against the project were very cogent. I wrote to Mitar Milutinovic, a researcher for Orange Data Mining, and a mentor in GSoC 2010. He had the following to say:
We had quite few sessions at Google Summer of Code mentors meeting this year about science open source. And we have found again and again that there are too many tools and everybody is implementing their own implementations again and again, instead of us collaborating. To avoid unnecessary re-implementations, inform all existing projects of the proposal of extending their project with such improvements and see which project agrees. And then extend that.
The risk of this project adding to the clutter and fragmentation of code is very real and it is our responsibility to avoid this at all costs.

To this end, we must:
  1. Give scipy.signal its due. Something that is already in scipy.signal, and is well executed and fast, should not be in the scikit. For instance, Charles R Harris says he has a Remez algorithm that works for complex filter design that belongs somewhere. It belongs in scipy.signal, with the original Remez algorithm.
  2. (For other dedicated signal processing code) start by reviewing existing code and defining the scope of this project (I understand that doing it right now is premature - but we must start somewhere)
  3. Assimilate fragmented code from projects that are dead or dormant.
  4. Assimilate code from other scikits / SciPy projects  like nitime, talkbox, etc.
So now to start coding and to wait and watch the kind of response the project receives. I hope I have managed to faithfully represent the reactions of the community through this blog post. I look forward with great excitement to the range of ideas people would bring to this project. I have a few ideas on which to write code - particularly in adaptive data analysis. But soon there will come a time when I'll need to be told what to do about this project. Till such time, I hope (rather ambitiously) that this post suffices as a starting point to the developers of scikit-signal.

1 comment:

  1. This is a nice post, with a very fair summary of the discussion. Jaidev, you will be a welcome member to the SciPy community. Coding is the first-order of business, but I'm glad you are also paying attention to how to best maximize the re-use of that code.

    ReplyDelete