Used to be that in the old linux days one could just use the commandline tool rename to mass-replace filenames with new ones. Amazingly, Mac OS X does not have this utility. Seems like you have to pay more and more for less and less.
After searching for a source code copy of rename (and no, I didn't want to write my own, simply because I know this script/program exists), all I found was MassReplaceIt. It works, but replace was so much easier to use.
Sunday, October 19, 2008
Sunday, July 20, 2008
Looking for a research problem?
There seem to be four kinds of research problems:
1. Trivial and easy. An example is an experiment that asks: does the F0 contour change shape when you put narrow focus on a word? You usually know the answer already, but (a) it's easy to do (b) you are guaranteed to get the answery you want.
2. Trivial but hard. Example: Write a dictionary of some obscure language. Such work can be useful. It's often not even all that trivial, actually; but it definitely is not the type of work that requires deep thought and insight into a problem.
3. Non-trivial and easy. This has got to be the best kind of research problem. You find a deep solution to an important open problem, but the solution is easy. It would be nice if research worked this way, but apparently it doesn't.
4. Non-trivial and hard. These are the real juicy problems, the only kind worth doing (in addition to 3 above).
Interestingly, when I look around, I mostly find people doing 1 or 2. People in category 1 just want to justify their existence, so we ignore them. People in category 2 should spend more time thinking before they leap into tackling a problem. I think people don't stop to think because of the pressure to publish more rather than better stuff. The amount of noise out there currently in science (at least in psycholinguistics) could be turned down quite a bit if people droppeed category 1 and 2 type problems.
There are people that are on the border; the work is not quite trivial, but at the same time, not quite well thought out. One typical approach I see again and again is: take a known technical tool from field X and apply it to problem Y in a different field. I guess there's nothing wrong with this in principle. But as a formula for doing science? I think one could and should aspire to do better.
1. Trivial and easy. An example is an experiment that asks: does the F0 contour change shape when you put narrow focus on a word? You usually know the answer already, but (a) it's easy to do (b) you are guaranteed to get the answery you want.
2. Trivial but hard. Example: Write a dictionary of some obscure language. Such work can be useful. It's often not even all that trivial, actually; but it definitely is not the type of work that requires deep thought and insight into a problem.
3. Non-trivial and easy. This has got to be the best kind of research problem. You find a deep solution to an important open problem, but the solution is easy. It would be nice if research worked this way, but apparently it doesn't.
4. Non-trivial and hard. These are the real juicy problems, the only kind worth doing (in addition to 3 above).
Interestingly, when I look around, I mostly find people doing 1 or 2. People in category 1 just want to justify their existence, so we ignore them. People in category 2 should spend more time thinking before they leap into tackling a problem. I think people don't stop to think because of the pressure to publish more rather than better stuff. The amount of noise out there currently in science (at least in psycholinguistics) could be turned down quite a bit if people droppeed category 1 and 2 type problems.
There are people that are on the border; the work is not quite trivial, but at the same time, not quite well thought out. One typical approach I see again and again is: take a known technical tool from field X and apply it to problem Y in a different field. I guess there's nothing wrong with this in principle. But as a formula for doing science? I think one could and should aspire to do better.
Sunday, July 6, 2008
Citation statistics
Here's a great article on citation counts (h-index etc).
And if this article inspires you to find out your own h-index, and you don't have access to Web of Science, google "scholar index".
And if this article inspires you to find out your own h-index, and you don't have access to Web of Science, google "scholar index".
Wednesday, May 21, 2008
Replicable analyses
I plan to store my published data and code here: thedata.org
And an interesting book to read on data management is here.
Monday, May 12, 2008
Vengeance in (German?) academia
Recently I had occasion to experience the petty politics of academia that one reads about in novels. And then I came across this article in the New Yorker, and I now understand that what politics in academia is really about: pigs.
http://www.newyorker.com/reporting/2008/04/21/080421fa_fact_diamond
http://www.newyorker.com/reporting/2008/04/21/080421fa_fact_diamond
Friday, April 11, 2008
Früher war alles besser
This is the kind of experience that is so cliched that people usually dismiss it as fiction.
My wife, child and I were standing at the Bochum train station yesterday morning, on our way to Berlin, and we got talking to an old couple. About five minutes into the conversation, the woman started telling us about how bad times had gotten to be: she had her money stolen by three black women. The husband (he had served in the German army during WW2) piped in with a wistful smile: "you know, that was the great thing about Adolf's time; such a thing would have never happened." Burn those mean green mothers from outer space.
Sunday, February 10, 2008
Releasing published data
In my so-far short career in science, I have asked a grand total of four people to release their published data or a model that they had written. Of these four, only one gave me the data. Of the others, the one whom I asked for a model refused, saying that model was over 10 years old and he could not easily find the code. Another said the same for their data (which was published in 2002): lost, or hard to recover. A third did not answer the email (although they did answer another one, so my email probably did reach them), which I take to be a refusal.
I wonder if this is normal? Why do people not release their data? Why/how do they lose old data? It's a mystery that something so important as data, or a painstakingly built model, can get lost so easily.
Subscribe to:
Posts (Atom)