Experiment 1: LEGUP vs Fitch
Method
Undergraduate student Matthew Morrow created a bare-bones version of LEGUP, that only worked for Light-Up puzzles. He ran a small experiment as follows: Subjects, who were made sure to have had no formal training in logic, would come to the lab, and take a pre-test of logic problems of the Knight-Knave type. Half of the subjects would then be shown an instructional video of either the system of propositional logic, followed by a video on the use of the Fitch interface. They were then given some time to solve some propositional logic problems using the Fitch interface. The other half was shown a video on the Light-Up puzzle, followed by a video on the use of the LEGUP interface, and were then given some time to solve some Light-Up puzzles using the LEGUP interface. Finally, both types of subjects were given a post-test of Knight and Knaves logic puzzles.
Results

The six pairs of bars on the left are from subjects that received logical reasoning instruction using Fitch. The six pairs on the right used LEGUP. The blue bars represent performance on the pre-test of logical reasoning, the red bars performance on the post-test. On average, the Fitch users were able to get almost 2 more correct answers on the logic problems, while the LEGUP users on average increased their performance with over 8 more correct answers. (Morrow, 2009).

In terms of accuracy, the LEGUP subjects increased the percentage of correct answers by 16.7%, while the Fitch users improved their percentage of corret answers by 5.8%.
Discussion
While the subject pool is obviously very small, the results do suggest some advantage of the LEGUP system over the Fitch system. That is, both groups increased their logical reasoning performance, but the LEGUP users did more so than the Fitch users. What is interesting to note, is that Knights and Knaves puzzles can be represented in propositional format and, given enough time, can be solved using Fitch. Obviously, this is not true of the LEGUP interface as provided, which could only be used to solve Light-Up puzzles! Then again, maybe their very limited exposure to propositional logic and their attempts to solve the problems using the Fitch system - though they did not have access to the interface, just paper and pencil - is exactly what made the Fitch subject group do worse! Clearly, more research is needed here.
References
Morrow, M. Logical Learning: Modern Formal Logic Teaching Tools vs. Logic Puzzle Learning, Psychology Undergraduate Thesis, RPI, 2009