Thursday, December 27, 2012

My First Instructables Post!

If you are skimming, my instructables pages can be found here.

If you want a small explanation and summary of the project, please, continue........  :)

For my Tangible Interactive Computing Class (838F) our final project had to include, among other things, an Instructables page on the site  The creation of this page was to replace the typical research paper that usually accompanies a term project.  The idea was to force us to create a step-by-step document of how we built our projects.  The recent explosion of open source hardware/DIY makers is fostered by such websites and blogs where people share their experiences working with hardware such as the Arduino.  

Our project (myself and Preeti Bhargava) was, to put it simply, a personal informatics black box that you clip onto your belt.  As you go about your day your activity, surrounding environment and locations are tracked to produce a report telling you how much time you spent inside and outside.  The device leverages off of a system-of-systems providing context-aware analysis for such ubiquitous computing devices.  We make sense the raw data signals from the multitude of sensors through our back-end system that uses many external API calls, home-grown context-aware systems and AI decision tree analysis. 

A brief video explanation of our project:

We in the class all agreed it would be a great idea to post something to the site vs. a research paper, but I believe we were all surprised at how receptive and supporting the community is!  Four of of the seven projects (you can view all seven here) were eventually featured on the front page of the site!  Our page currently has over 20,000 views!

So, if you haven't checked out our step-by-step process yet, please do!  It shows how the hardware was made, including the soldering involved, the 3D-printing and end product.  I hope you are somewhat inspired or excited to explore this area of DIY hardware hacking.  I know I feel empowered as a result of the projects that I contributed to in this semester-long class.  As a CS Master's/PHD student I typically took the hardware of the systems I was working with at face value.  I now know I can not only create the software that runs and extracts data from hardware but also design my own hardware and sensor systems!

I thank Dr. Jon Froehlich for that hardware liberation!

Saturday, December 8, 2012

Webservice to determine if a Latitude / Longitude Point is inside a polygon or building

I spent some time trying to find an easy, do-it-yourself method to build a web-service that would take a Lat/Long as input and tell me if that point is contained within a polygon.  For my purposes I wanted to know if a point was within a building or not and a polygon will easily satisfy this request.

The Ray-casting algorithm is a method to determine just that.  Google maps and Esri have function calls that one can use within their API's, but those are client-side as you can only call such functions through javascript, JQuery or the like.  Thus if you want something on the server side you must write it yourself.

Luckily I found this rosettacode side that had the ray casting algorithm coded in several languages!  Their example uses pre-canned polygons and points.  I modified it to accept a lat/long from a user and return true if the point is within any of my polygons/buildings and false otherwise. The list of polys contain the Lat/Long points for each building.

My python code:

# Code was taken from
# uses simple ray-casting algoritm to see if point lies within a polygon
# Adapted by Nick Gramsky to be used as a service to check if LatLong was inside  
#    a GIS polygon
from collections import namedtuple
from pprint import pprint as pp
import sys
Pt = namedtuple('Pt', 'x, y')               # Point
Edge = namedtuple('Edge', 'a, b')           # Polygon edge from a to b
Poly = namedtuple('Poly', 'name, edges')    # Polygon
_eps = 0.00001
_huge = sys.float_info.max
_tiny = sys.float_info.min
def rayintersectseg(p, edge):
    ''' takes a point p=Pt() and an edge of two endpoints a,b=Pt() of a line segment returns boolean
    a,b = edge
    if a.y > b.y:
        a,b = b,a
    if p.y == a.y or p.y == b.y:
        p = Pt(p.x, p.y + _eps)
    intersect = False
    if (p.y > b.y or p.y < a.y) or (
        p.x > max(a.x, b.x)):
        return False
    if p.x < min(a.x, b.x):
        intersect = True
        if abs(a.x - b.x) > _tiny:
            m_red = (b.y - a.y) / float(b.x - a.x)
            m_red = _huge
        if abs(a.x - p.x) > _tiny:
            m_blue = (p.y - a.y) / float(p.x - a.x)
            m_blue = _huge
        intersect = m_blue >= m_red
    return intersect
def _odd(x): return x%2 == 1
def ispointinside(p, poly):
    ln = len(poly)
    return _odd(sum(rayintersectseg(p, edge)
                    for edge in poly.edges ))
def polypp(poly):
    print "\n  Polygon(name='%s', edges=(" %
    print '   ', ',\n    '.join(str(e) for e in poly.edges) + '\n    ))'
if __name__ == '__main__':
    polys = [
      Poly(name='avwilliams', edges=(
        Edge(a=Pt(x=38.9913160, y=-76.937079), b=Pt(x=38.991333, y=-76.936119)),
        Edge(a=Pt(x=38.991333, y=-76.936119), b=Pt(x=38.990287, y=-76.936108)),
        Edge(a=Pt(x=38.990287, y=-76.936108), b=Pt(x=38.990278, y=-76.937057)),
        Edge(a=Pt(x=38.990278, y=-76.937057), b=Pt(x=38.990495,y=-76.937052)),
        Edge(a=Pt(x=38.990495,y=-76.937052), b=Pt(x=38.990499,y=-76.936424)),
        Edge(a=Pt(x=38.990499,y=-76.936424), b=Pt(x=38.991091,y=-76.93643)),
        Edge(a=Pt(x=38.991091,y=-76.93643), b=Pt(x=38.991104,y=-76.937079)),
        Edge(a=Pt(x=38.991104,y=-76.937079), b=Pt(x=38.9913160, y=-76.937079))

    if len(sys.argv) != 3:
       print "Incorrect number of arguments.  Please submit a lat and a long...."

    userpoint = (Pt(x=float(sys.argv[1]), y=float(sys.argv[2])))

    testpoints = (Pt(x=float(sys.argv[1]), y=sys.argv[2]), Pt(x=38.990842, y=-76.93625),
                  Pt(x=38.9021466, y=-77), Pt(x=0, y=5),
                  Pt(x=10, y=5), Pt(x=8, y=5),
                  Pt(x=10, y=10))
    inside = False
    for poly in polys:
        #print '   ', '\t'.join("%s: %s" % (p, ispointinside(p, poly))
        #                       for p in testpoints[:3])
        #if ispointinside(testpoints[1], poly):
        if ispointinside(userpoint, poly):
           inside = True

    print inside

And the corresponding php code:

Oh and this was done on a mac, hence the need to call out 'python' in the system call. Syntax for this call will vary by OS. Of course this example is using a hard-coded polygon within the python script. That's fine if you want to check only a few polygons/buildings but not a great way to code if you want to check many buildings. One could simply create a small database that stores polygons and iterate through each polygon. Using PostGres you can perform a bounds call to retrieve only polygons that contain the point within their minimin bounding rectangle and iterate through this python code. Additionally this could very well be coded in php but I just didn't have the time :) It would reduce the need to perform a system call and reduce the software dependencies....

 <PUBLIC SERVICE ANNOUNCEMENT> Yes, my php code IS very light and simple. One should ALWAYS clean user input prior to working with the parameters they provide. My example/prototype did not include that (BAD DEVELOPER) but this was an internal prototype only. </PUBLIC SERVICE ANNOUNCEMENT>

Wednesday, June 6, 2012

How I gained over 100 Twitter followers in one week without writing one Tweet. EVER....

So I stumbled on this by accident a little while ago while working on TwitterStand earlier this semester.  I thought it was quite interesting and worth a (semi-)formal study.  So I set up a new Twitter account @UniqueShout.

This is what the account looked like prior to adding any new followers:

Just your standard, normal account that follows 5 accounts and sits idle.  I created the account on May 25.

Almost instantly I followed 1000 people using the Twitter API.  36 hours later I followed another 790, however I was already being followed by 30 Twitter users at this point in time...

One week later I'm sitting at 105 followers, yet I still yet to produce A SINGLE TWEET!

It's now been two weeks and the rate of new followers has subsided, yet I do currently have 111 followers.  There have been several accounts that stopped following me, yet I believe these are spam/porn accounts.

So you're likely saying "Most of these accounts must be spam/porn accounts!".  Well, not exactly.  Porn seems to account for 10% of the followers.  I think many of the accounts that dropped were spam accounts.  Let me first step back and say something about the accounts in general that are following @UniqueShout.  They fall into one of four categories:

  1. Spam/porn accounts - These accounts are not real people but the exhibit the following behavior:
    1. All have a hyperlink in their 'about' section that leads to (what I believe to be) a porn site.
    2. They all produce 3 Tweets over a 36-hour period.  Each Tweets looks to be a crafted Tweet that is about nothing yet has text to possibly fool spam filters like those found on email servers.
    3. All accounts produce their Tweets at roughly the time, suggestive these accounts are run by the same person/entity.
  2. Real people who love to Tweet.  These are people who Tweet about their daily lives and random thoughts.  This looks to be 10% of the people who are following this account.
  3. Legit newsworthy Twitter users, like local CNN corespondents, local news affiliates and other micro bloggers.  This looks to be ~30% of the followers.
  4. Mixed accounts that include small businesses, organizations and users that communicate over Twitter.  I don't consider these personal accounts as the account might be a front for a small business, a personal business or organization.  This appears to be about 50% of @UniqueShout's followers.
Here is the latest snapshot from the account showing 111 followers.  Note some of the accounts that are following @UniqueShout, one of which is even @ESPNDeportesMia

I think the following take aways should be noted here:
  1. It's quite interesting WHY the account has so many followers despite the lack of Tweets.  I'm now curious about the possible mining techniques Twitter-mining sites have to decide who to follow.
  2. I may have produced a method for identifying spam.  While not optimal or exhaustive, it appears to find accounts that are owned by similar entities.  Is this a possible method to trap/ID spam BEFORE it does its dirty work?
  3. The accounts I followed are mostly news-spreading accounts.  But not all of them.  2% of the accounts have never Tweeted and are followed by very few people.  Some of the accounts are icons on Twitter (@cnn for example) yet some are small local news outlets.  A further study should find the minimal subset that produces the most followers.
  4. I'm unsure what happens when I stop following the accounts @UniqueShout is following.  This will be a follow-on study.
  5. I'm unsure if this is a good method to add followers or not.  One thing is for sure, if you want a high-follower count, this is a way to get a decent amount quickly and easily.  So if you're setting up a small business for example, this might be a method to jump-start your Twitter account.  At some point I'll do this to my established, personal account and see if I get similar results.

I'll take a deep dive into these accounts in my next blog.  

Monday, November 14, 2011

Visualization of Social Network behind #OccupyWallStreet Twitter Hashtag

The following visualization was created using Microsoft NodeXL and the 'Group in a Box' method to show clusters within the OccupyWallStreet Hashtag social network.

Nodes are sized according to the their In-degree, or the number of times someone has mentioned that user in a tweet. The images for each node are the actual profile image for each user.

This image contrasts a similar image created by Marc Smith. I'm unsure of the difference in the image is due to the fact that his data is from 10/8/2011 or that I'm still a bit new to the new Group in a Box feature of NodeXL. I don't see many other ways to cluster the nodes so I'm inclined, at this moment, to say the difference in the network structure is due to a shift in the composition of the network due to the many events surrounding the 'Occupy' movement over the past month.

The data for this visualization was captured over a 22-hour period from the evening of 11/12/2011 - the afternoon of 11/13/2011. Previous blog posts show how to use Python and MongoDB to store and parse this data.

For comparison purposes, the below image is the same network without the Group in a Box clustering method applied:

What to do with my Twitter data once it is in my MongoDB?

My previous blog showed us how we can use Python, pycurl, pymongo, MongoDB and the Twitter Streaming API to import all tweets of a certain hashtag into our database. Once we have all of that data, how can we parse it so we can effectively use it? My last example collected the entire tweet.

Tweets, though limited to only 140 characters, are actually large when you observer the entire JSON object. (Recall the API returns the Tweet as a JSON object). An example tweet shows the large JSON structure. There is a lot of information in a Tweet so capturing the entire thing is worthwhile, especially since it is just a a few bytes of storage/tweet. However we'll need to parse each tweet to analyze the structure of our dataset. I won't get into the specifics of how to use JSON or the entire Twitter JSON object, but one will have to have have a general understand of how to use JSON to fully understand how we go about the example show below.

So, let's say we want to map the social network at play for a twitter database. We would want to extract the userID of the tweeter and whatever other users they tweet about. We can query the Database for certain fields of each tweet. We will want the entities.mentions.screen_names[array] and the user.screenname string. We'll loop through all of our tweets and print out a list of edges that would otherwise form a social graph. In this example if a user does not tweet about anyone, I still capture the tweet and show the link as a self-loop in order to capture the Out-Degree (for network analysis reasons) of each 'Tweeter.

So, the sample code would be:
import pymongo
import json

from  pymongo import Connection
connection = Connection()
db = connection.occupywallstreet
print db.posts.count()
for post in db.posts.find({}, {'entities.user_mentions.screen_name':1, 'user.screen_name':1}).sort('user.screen_name', 1):
    if len(post['entities']['user_mentions']) == 0:
        print post['user']['screen_name'], post['user']['screen_name'] 
        for sname in post['entities']['user_mentions']:
            print post['user']['screen_name'], sname['screen_name']
buffer = ""
for post in db.posts.find({}, {'user.profile_image_url_https':1, 'user.screen_name':1}).sort('user.screen_name', 1):
    if buffer == post['user']['screen_name']:
    print post['user']['screen_name'], post['user']['profile_image_url_https']
    buffer = post['user']['screen_name']

It's pretty straight forward. We connect to the database, perform a query where we only return the screen_names of those a users mentions and the screen_name of the tweeter himself. This is accomplished with the following line:

for post in db.posts.find({}, {'entities.user_mentions.screen_name':1, 'user.screen_name':1}).sort('user.screen_name', 1):

.sort('user.screen_name', 1)
sorts the output so you have all of the activity per user in order.

The last loop gives me the image of the Twitter user. My end goal is to visualize this network in NodeXL and I will want to use the profile_image of the user as the shape of the node. Thus I iterate over all users and capture the profile_image_url_https value for each user with the following block of code:

for post in db.posts.find({}, {'user.profile_image_url_https':1, 'user.screen_name':1}).sort('user.screen_name', 1):
    if buffer == post['user']['screen_name']:
    print post['user']['screen_name'], post['user']['profile_image_url_https']
    buffer = post['user']['screen_name']

When it is all said and done I have all edges of my network along with URL's of the profile_image for each user in the database that Tweets.

Up next I'll share some visualizations I created with data I gathered using these methods.

Saturday, November 12, 2011

How to use Twitter's Filtered Streaming API, Python and MongoDB

When I started doing this I didn't see anywhere on the Internet that had the entire solution to the following problem:

"Track a hashtag following in Twitter and place into a MongoDB via Python".

For example, I needed to grab all Tweets that had the #occupywallstreet hashtag in them and place them in a Mongo Database using Python.

Why MongoDB?  It's easy, efficient and perfect for storing/performing queries on a large number of documents.  When the documents are Tweets encoded as JSON documents, it's even easier.

Why Python?  I had never used Python before but found nice and simple Twitter and MondoDB plugins to make this EASY.

So, to get to the meat of the problem, here is the code:

import pycurl, json
import pymongo

WORDS = "track=#occupywallstreet"
USER = "myuser"
PASS = "mypass"

def on_tweet(data):
        tweet = json.loads(data)
        print tweet

from pymongo import Connection
connection = Connection()
db = connection.occupywallstreet
conn = pycurl.Curl()
conn.setopt(pycurl.POST, 1)
conn.setopt(pycurl.POSTFIELDS, WORDS)
conn.setopt(pycurl.HTTPHEADER, ["Connection: keep-alive", "Keep-Alive: 3000"])
conn.setopt(pycurl.USERPWD, "%s:%s" % (USER, PASS))
conn.setopt(pycurl.URL, STREAM_URL)
conn.setopt(pycurl.WRITEFUNCTION, on_tweet)

We're relying on the REST API from Twitter to return our Tweets.  The same options we are sending to pycurl produce the same effects as if we had run the following command on the command prompt:

"curl -d track=#occupywallstreet -umyuser:mypass"

The line:
db = connection.occupywallstreet
is where we make the connection to the Mongo Database. This requires that I have MongoDB up and running and have created a database called occupywallstreet. The command:
places the JSON object into the database. You can then query and search for tweets using MongoDB queries. Please see Querying - MongoDB for more information on how to query the database and MongoDB for general MongoDB information.

You have to install the pycurl and mongodb plugins for Python. There are various ways to do this. I used 'easy_install' to simply download and install them with essentially no effort.

A key point to making this code run without fault is found in the function on_tweet. Looking at the callback function we have to make our code resilient to the possible noise that can come back from Twitter. If you're ever run 'curl' from the command line you will occasionally see the API return blank lines. We need to account for these blank lines and other non-JSON values the API might return.
def on_tweet(data):
        tweet = json.loads(data)
        print tweet

I print out all tweets just so I can verify the program continues to run. I don't follow the tweets but if I fail to see tweets streaming across my terminal I know something went wrong.

And thus in just 27 Python lines we have a nice program that stores all tweets containing the #occupywallstreet hashtag into a Mongo Databse.

Monday, September 19, 2011

Why nothing will replace Facebook (least not for a while)

Though I'm quite motivated in many aspects of my life:
  • Olympic marathon trials qualifier in marathon (extreme amount of work)
  • MS/PHD program while working full time
I'm still VERY lazy.  I do not think I am alone.  Especially when it comes to technology and using web-based applications.  I'm hardly motivated to find all of the features of gmail, Facebook and other such entities.  I usually just wait until I stumble across someone who wants to show me how well they have mastered these tools and pick up tips.

For example, gmail has a multi-sign-on feature that allows you to sign on as multiple users at once.  I have two gmail email addresses.  I do not use this feature.  I have safari and Mozilla on my laptop and keep one account signed in within each browser, effectively signing in both account via two windows.  Why do I do this?  Because I do not want to figure out how to use their service.  I started to read it, got tired after 5 seconds and decided my way in life was sufficient.

But I digress.  Slightly.

I do not think I am alone with my aversion to learn all of the features of a product I already use.  I'm tech-savy and I hate doing it if I'm not motivated at times.  I currently have 3 email addresses.  Part of this is due to work where I cannot check my personal email at work and checking my work email at home is a pain.  This I am forced to live 2 separate email lives.  It should stop there.  But I have a gmail account that I am slowly converting to for all email traffic.  But (and this is embarrassing.  So embarrassing that my professor tried to kick me out of class for admitting it today) I still have an AOL account.


Because in order to move off of AOL completely I need to email EVERYONE I know and ensure they never send email to my AOL account.  I'd also have to move saved mail from AOL to gmail, move contacts, update automatic notifications I have that I actually use online (I could set up a forward but then I'd get all of the spam my AOL account also receives as well) and occasionally check my AOL for important emails from family that I need to get.   We all have family members that just don't get the 'I changed email addresses' bit, right?

So it's a pain.  And laborious.  And that is what it would take to migrate off of Facebook.  All of my friends are on Facebook.  All of my pictures are on Facebook.  Facebook is now also an email system of sorts and I (gasp!) save emails there in a effort to keep some important things vice write them down.  If I migrate to google+ I, and all of my friends, would have to re-friend everyone, copy all pictures I might want, establish the same lists via circles and re-create my virtual life.

But I'm lazy.  I'm not going to do it.  Maybe the rest of the Comp. Sci. students in the world might decide to do this, but they're not.  They're not as lazy as I am and they're not doing it.  Even if G+ is better.  Even if the UI is faster, more intuitive, integrates into the rest of the Google platform and blows Facebook out of the water.  There is something to be said about being 'first' in certain social computing aspects.  Facebook did overtake MySpace, but I believe MySpace did a poor job covering the different aspects of what social media needed to do.

MySpace didn't provide the API Facebook did and let's be honest, it was a little shady.  The default 'browsing' feature was to search for women between 18 and 35 that were single, viruses spread like the plague on Myspace and it was WAY too easy to assume the identity of another person (a group of my friends all decided to be Chuck one day on Myspace.  They all had the same profile pic, info, name, music and background.  Unless you knew the actual URL of the real Chuck, you never knew who you were talking to).

But Facebook incorporates any and every improvement Google+ makes to Facebook in a quick fashion, allowing users to have all features of both products.  So, until the lazy folks like myself die off from this planet or a social network that is tracked inside our conscious brains is invented I do not see the masses leaving Facebook anytime soon.

Sorry Google, I really liked Google+


For an interesting visualization of the different lives I live via email please see this: