Posts Tagged open source
64-bit Scientific Python on Windows
Posted by mailletf in Programming on September 7, 2010
Getting a 64-bit installation of Python with scientific packages on our dear Windows isn’t as simple as running an apt-get or port command. There is an official 64-bit Python build available but extensions like numpy, scipy or matplotlib only have official 32-bit builds. There are commercial distributions such as Enthought that offer all the packages built in 64-bit but at around 200$ per license, this was not an option for me.
Stumbled upon the Python Extension Packages for Windows page that contains dozens of extensions compiled for Python 2.5, 2.6 and 2.7 in 32 and 64 bits. With these packages, I was able to get a working installation in no time.
My time at Sun Labs and pyaura
Posted by mailletf in Programming, Voyage on October 9, 2009
My internship at Sun Microsystems Labs, which has been going on for about 15 months – 9 of those full time at their campus in the Boston area – is coming to an end. During the course of those months, I’ve met a lot of very smart and fun people, I’ve worked on very challenging and stimulating problems and I’ve discovered a bunch of really good New England beers.
All my work has been centered around the Aura datastore, an open-source, scalable and distributed recommendation platform. The datastore is designed to handle millions of users and items and can generate content-based recommendations based on each item’s aura (aka tag cloud).
Last summer, under the supervision of Paul Lamere, I worked a lot more on our music recommendation web application, called the Music Explaura and designed a steerable recommendation interface. (We also have a Facebook companion app to the Explaura that was created by Jeff Alexander.)
This summer, I worked with Steve Green on many different things, including what I’d like to talk about in this post, pyaura, a Python interface to the datastore.
pyaura
The idea behind pyaura is to get the best of both world. While the datastore is very good at what it does – storing millions of items and being able to compute similarity between all of them very quickly – the Java framework surrounding it is a bit too rigid to quickly hack random research code on top of it. While my actual goal was to experiment with ways of doing automatic cleanup and clustering of social tags, I felt I was missing the flexibility I wanted and was used to getting when working on projects using Python’s interactive environment.
Without going into details, since the datastore is distributed and has many different components, it uses a technology called Jini to automatically hook them all up together. Jini takes care of automatic service discovery so you don’t have to manually specify IP adresses and so on. It also allows you to publicly export functions that remote components can call. A concrete example would be the datastore head component allowing the web server component to call it’s getSimilarity() function on two items. The computation goes on in the datastore head and then the results get shipped across the wire to the web server so it can serve its request. However, Jini only supports Java leaving us no direct way to connect to the datastore using Python.
After looking around for a bit, I stumbled upon a project called JPype, which essentially allows you to launch a JVM inside Python. This allows you to instantiate and use Java objects in a completely transparent way from within Python. Using JPype, I built two modules which together, allow very simple access to the datastore though Python.
- AuraBridge: A Java implementation of the Aura datastore interface. The bridge knows about the actual datastore because it can locate it and talk to it using Jini.
- pyaura: A set of Python helper functions (mostly automatic type conversion). pyaura instantiates an AuraBridge instance using JPype and uses it as a proxy to get data to and from the datastore.
Example
To demonstrate how things become easy when using pyaura, imagine you are running an Aura datastore and have collected a lot of artist and tag information from the web. You might be interested in quickly seeing the number of artists that have generally been tagged by the each individual tag you know about. With these few lines of code, you can get a nice histogram that answers just that questions:
import pyaura.bridge as B import pylab as P aB = B.AuraBridge() counts = [len(tag.getTaggedArtist()) for tag in aB.get_all_iterator("ARTIST_TAG")] P.hist(counts)
The above code produces the following plot:

This is the result we expect, as this was generated with a datastore containing 100,000 artists. As less and less popular artists are added to the datastore, the effects of sparsity in social data kick in. Less popular artists are indeed tagged with less tags than popular artists, leading to the situation where very few tags were applied to more than 5000 artists.
This is a small example but it shows the simplicity of using pyaura. With very few lines of code, you can do pretty much anything with the data stored in Aura. This hopefully will make the Aura datastore more accessible and attractive to projects looking to take advantage of both its scalability and raw power as well as have the flexibility to quickly hack on top of it.
trafshow : Display current network traffic
Posted by mailletf in Technology on February 26, 2009
trafshow is a simple little program that displays the current traffic on a network interface.

It listens on a given interface in promiscuous mode and displays information on each connection, its remote address and the amount of traffic going back and forth.
It can easily be installed on a Mac via Macport.
pfSense : a software alternative to your old router/firewall
Posted by mailletf in Linux, Technology on January 13, 2009
My old D-Link router, like pretty much every other router I’ve ever owned, wasn’t very reliable in some way and so I was looking for open-source alternative firmwares like Tomato to flash it with. With the clear lack of effort put into the official firmwares, I thought it couldn’t hurt to try. Unfortunately, my router wasn’t supported by any third party firmware.
During my search, I however stumbled upon pfSense, a Free-BSD based router/firewall distro. It’s small (<100mb), runs on a 100MHz PC and includes all the features you would get on a very expensive commercial router (Firewall, NAT, VPN server, usage graphs, dynamic DNS support, per-ip bandwidth usage, QoS, etc).
I already had a dedicated fileserver so I installed pfSense as a VM on it using VMWare (I could also have done it with VirtualBox, a free alternative to VMWare). All you need are two NICs. I now only use my old router as a wireless access point because pfSense naturally has a DHCP server. I could even completely let go of my D-Link router if I added a wireless NIC in my server.
If you have an old PC lying around or one that could be a host to a pfSense VM, all you might need is an extra NIC to get an enterprise-grade router that will cooperate a lot more than any cheap 50$ D-Link/Linksys/Netgear/etc router.
How does your web page look in every browser? (Updated)
Posted by mailletf in Web design on June 1, 2008
Every web designer knows that making a web page come out just right in every browser can cause quite a headache, especially when combining elements like W3C standards and IE6. It’s hard to have a working copy of all the different browsers and all the different versions to test. Browsershots.org to the rescue!
Browsershots makes screenshots of your web design in different browsers. It is a free open-source online service created by Johann C. Rocholl. When you submit your web address, it will be added to the job queue. A number of distributed computers will open your website in their browser. Then they will make screenshots and upload them to the central server here.
Works great and they have 15 different browsers running on Linux, Mac OS, Windows and FreeBSD.
My blog came out perfectly on most browsers and platforms, even exotic ones like Kazehakase. Luckily for me, Microsoft was there to save the day, or else I wouldn’t have had any pictures to show.
Update : Anoter website, IE NetRenderer, allows you to get instant screenshots of your site using different version of IE.
attach failed – no mountable file systems
Une pourtant si belle soirée, un peu froide, journée pas assez productive au boulot, mais rien qu’une bonne Samuel Adams Black Lager (oui c’est ce que je bois maintenant) ne pouvait guérir. Après m’être un peu battu avec mon poulet qui n’a pas trouvé qu’une journée passée dans le frigo était suffisante pour se dégeler, j’étais prêt à travailler un peu pendant la soirée.
Pour vous donner un peu de contexte, étant parti durant trois mois, je me suis acheté un disque dur externe pour amener avec moi mes précieuses séries de TV, tous mes MP3, des programmes, des jeux; un amalgame de séquences binaires qui permettent à un nerd exilé de fonctionner. Étant un citoyen modèle comme nous le sommes tous, j’achète légalement, tout comme vous, tous les programmes que j’utilise et la musique que je prends un si grand plaisir à écouter. Je tenais toutefois, dans le seul but de m’assurer un peu de confidentialité si, au nom de la sécurité nationale, on me soumettait à une fouille exhaustive lors de mon passage en Amérique, à protéger mes précieuses données en les encryptant avec TrueCrypt. En effectuant une attaque en force (traduction de brute force attack), le meilleur des ordinateurs aurait besoin de plus de temps que l’âge même de l’univers pour réussir à briser l’encryption. Détail superflu, mais amusant à dire
Pourquoi je partage ces informations sans intérêt apparent? Simplement parce que j’ai été confronté, après avoir savouré mon poulet dont je vous parlais tout à l’heure, au message suivant :

Je vais traduire pour tous les informaticiens dans la salle : j’ai perdu l’équivalent de 350 gigaoctets de données. On peut aussi le voir comme une pile de 500 CD. Pouf, disparu. Oui j’en ai des copies à Montréal, mais 150$ d’essence et 10 heures de route m’en sépare. Je sais… je sais ce à quoi vous pensez tous : les choses pourraient être pires; j’en conviens. Mais dans mon petit univers égocentrique, cette bonne nouvelle crée une légère perturbation émotionnelle en moi.
Je vais tout de même vous laisser sur une note plus heureuse, avec quelques photos du mariage de mon très bon ami Patrice qui a eu lieu samedi dernier.
14h
20h
23h
Musique : Portishead : Dummy, Mark Rapp, Radiohead
Useful free Mac apps
I’m a relatively new Mac user so I’m keeping a list of some useful free apps that I’m using on my Mac. It’s a work in progress…
- Instant Messaging : Adium
- PDF Annotation : Skim
- Notes taking : Freemind
- Linux package manager : Macports (lots of my friends use fink)
- EquationService : Create equations from latex that can be used in Keynote
- MacFUSE : MacFUSE is software that allows you to write arbitrary file systems as user-space programs
- Creative MP3 Player support : XNJB
- OSX Ext2 Filesystem
- Switch : Audio file converter
- Cyberduck : SFTP/FTP
- Espérance DV : Create a ramdisk
