Saturday, October 20, 2012

Poortego: Intelligence for the 99%

The past few weeks I've had the pleasure of attending and presenting at SecTor and RSA Europe.  The topic of my presentation was on a project that I have been working on in my "spare" time - I call the project "Poortego" an intelligence tool for the 99%.  The code and presentation materials can be accessed here:


Warning: this project is in its infancy and is still in a state of initial development versus being a polished tool.

The premise of the project is that there are few tools that fall into the niche of being a threat intelligence tool and many of them are quite expensive solutions (e.g., Palantir and Analyst Notebook) - one outlier is Maltego which is "affordable" but it does have its limitations (particularly if you are using the Community Edition).  Some limitations include, that it is closed-source, out-of-the box relies on the Paterva servers (an issue for those with sensitive data), limited export capability, and restrictions on inputs to transform operations (limited to a single entity).  Note: Maltego is an excellent / mature tool in the intelligence space - the limitations that I listed are not meant to be a slight against the tool or the company.

Poortego is a completely free and open-source project written entirely in Ruby, leverages ActiveRecords for flexible backend support, leverages Rex::UI for the command-line interface, and can run as a stand-alone application or as a Metasploit plugin.
Poortego uses its own backend and framework for storage and data manipulation - no reliance on other projects (e.g., Maltego or Metasploit).  Poortego supports the notion of data transforms and support for the importing and exporting of data into different formats.  The bulk of my development time thus far has been on the framework, so I have not spent a ton of time on transform and import / export plugins yet - only a few are present in the initial code base.  Poortego currently has Graphviz export support as its only visualization component.  I've recently gotten turned on to neo4j and am investigating its usage for storage and visualization of intelligence -- much more to come!

In order to illustrate the value of intelligence and Poortego's usage from both attacker and defender perspectives - I presented some demonstrations.

The first demonstration (defender) was from analysis of an incident impacting one of Zscaler's customers.  I observed some strange and unknown beaconing activity from a customer - there was not much information on the URL/domain, but I was able to tie the IP address of the server to other domains which were related to a malware sample in the open-source:


Furthermore, taking the information on the malware sample and related domains, I was able show that there was a relationship to a ThreatExpert report on 2008 targeted attacks against the Pentagon.


Note: all of the link graphs are Graphviz exports from Poortego.

The attacker-perspective demonstration that I presented was to stage an attack against the RSA Europe conference using nothing more than intelligence.  I wrote a transform for Poortego to retrieve, parse, and store presentation, speaker, moderator, panelist, and company relationships from the RSA EU Event Catalog.  Obviously one could further exploit the knowledge of the relationships and do social networking enumeration - but I wanted to do something less obvious.  The Event Catalog also included all of the presentations in PDF format - I wrote another Poortego transform to retrieve the PDF files and run ExifTool on the PDFs to extract out the author information and include an author relationship to the presentations.  It was interesting to see the number of presentations that had a different author than speaker -- there were two major outliers in this respect...

Unfortunately I was just informed that there were some complaints regarding this particular demo, in order to not fuel the fire, I'm redacting this section of the post.  There was no ill intent and all information used was OSINT.  It goes to show the sensitivity in the security industry the moment potential offensive tactics are shown.  

While I present a project / tool (Poortego) - I also stress that it is not the tools that create the intelligence but the analysts / people.  Tools can certainly help facilitate though!  Please reach out to me if you are interested in contributing to the project - there is still a lot of work to do to make this a well polished tool.

Wednesday, September 19, 2012

A CVE-2012-4969 ("MS IE 0-Day") Seen In The Wild

Here is an example of page that we have observed serving the CVE-2012-4969 exploit in the wild:

hxxp:// invitation [.] spacegas [.] com /Join-Id.html

The page itself appears to be a virtual meeting request (e.g., WebEx or JoinMe):

Possibly a social engineering lure sent over email, as no HTTP referrer strings were observed in the transactions.

The source of the page includes a trailing iframe pointing to a page (join.html) directly on the same IP (211.237.20.39):


This join.html page serves the CVE-2012-4969 exploit code:


"Moh2010.swf" is a common file-name seen related to this particular threat (just do a Google search for the file).  The MD5 of the Flash file that we pulled down was: 501cf420b5495874d6c795804ce21fd8, and is also encrypted with the DoSWF encryptor.

Using the Anubis sandbox to run the exploit and the malware embedded in the Flash file, the following report was generated: here.

While we see a registry key created:
HKU\​S-1-5-21-842925246-1425521274-308236825-500\​Software\​Microsoft\​Internet Explorer\​International\​CpMRU

And a mutex set:
_SHuassist.mtx

Further community analysis shows that someone uploaded the dropped malware to VirusTotal - which shows the dropped malware as a Delphi RAT (possibly a Hupigon variant) - reported here.

Thursday, September 13, 2012

A Possible Fake Obama Page

I noticed this site being accessed this morning:


www [dot] obama2012 [dot] com

It certainly looks like a legit domain, but fell into that "uncategorized" bucket.

Currently when you visit the page, it redirects you to a Google search for "barak Obama" - notice the mis-spelling of "Barack" as well as the mixed upper and lower cases for the name.  Needless to say this started looking suspect.

The domain was registered using Whois Privacy Protection and leverages name servers from FABULOUS.COM:


FABULOUS.COM provides a parking service for domains, though several seen in the past have had poor reputation, e.g., 


The site is currently resolving using a round-robin from Savvis:




Further analysis of the IPs in the round-robin show open-source info reporting them to be used in phishing and malware schemes in past - though this is more of a reflection of past abuse on Savvis.

If/when this domain is un-parked and used it will be interesting to see what type of content it serves.  Furthermore, as the election grows near it'll be interesting to track some of the malware and fraud schemes capitalizing on the event. 


Thursday, August 16, 2012

Aggressive Activity on 146.185.255.41

I've noticed an increase in activity to 146.185.255.41 resulting from JS inclusions on mass-compromised WordPress sites.  For example, the site "tvtattle.com" has an included file on its main page "tvtattle.com/wp-includes/js/l10n.js"

That contains a simple unescape / document write JS script:

That includes content from a source from site hosted on 146.185.255.41:

There are a number of domains involved in this campaign on 146.185.255.41 leveraging dynamic DNS providers.  Domains, such as:

gododosoasoeow.epac.to
bolorofodosozo.freetcp.com
lolrotofodocoz.faqserv.com
ribojedrgzwidini.epac.to
nopotorolokolo.faqserv.com
colorotogowosodo.qpoe.com
uhlakalveibara.1dumb.com
uhlakalveibara.faqserv.com
lolrotofodocoz.2waky.com
holopopopopto.faqserv.com
uhlakalveibara.ddns.info
...
Given the large number of domains seen (>80) - it is likely that there is domain generation and rotation logic being used in this campaign.

CleanMX has observed and listed a handful of involved domains here.
Unfortunately, my attempts thus far to replay transactions has resulted in 0 byte 200 responses or redirect to another page with a 0 byte response (i.e., nothing useful).  Scumware.org lists some of the sites involved as "Trojan.JS.Redirector.cq."  Based on the logs that I've seen it appears that it is a redirector campaign to Blackhole - if/when I receive any related samples I will make an update to this post.

In large part I have noticed that the IP address and domains are not in most block lists - so I wanted to make a note of this activity.








Thursday, June 28, 2012

CIF-Lite: Customizing CIF to your schema

This post documents how to customize CIF to use your own data handling / storage methodology.  My particular customization I call "CIF-Lite" since it focuses on cutting down on storage requirements and simplifies the database layout.  If you are interested in my specific customization of CIF: the code and installation instructions are available at https://github.com/mgeide/cif-lite.  However, this blog post will provide details for performing your own customization as well.

By the way, it is past time that CIF have a logo ;)

For those unfamiliar with the CIF project, it stands for Collective Intelligence Framework (CIF) - it is an excellent project being spear-headed by the REN-ISAC (the .EDU information sharing and analysis center).  The project is fully open-source and provides a flexible framework for automating the pulling down, parsing, storage/archiving, analysis, and retrieval of information from data-feeds.  For example, no doubt in your security environment you have scripts that grab data from the likes of ZeusTracker, MalwareDomainsList, and whatever other data sources you're interested in - and then do things like applying the data to your blocklists.  CIF allows you to perform these actions without having to write any code, just write a config (.cfg) file and voila! you have your datafeed integrated into CIF.  There are other great features too, such as tracking analyst searches in CIF and defining data-sharing restrictions ... but this post isn't about CIF features (you can read up on them on the CIF website), it is about customizing the back-end CIF data handling and storage.

The rest of this post assumes that you have a working install of CIF v0.01 (note: this entire post refers to this version of CIF, I have not checked out the Beta versions that are under development yet) and are interested in customizing the storage of the data that is auto-magically pulled down and parsed as the result of your CIF cfg files that you have built.

Without question CIF is a great project and has a growing user-base following it within the security community - for example, integration of ELSA with CIF.  From using CIF I have found that the storage back-end may not be ideal for all environments - and I've heard similar feedback from a few other analysts using CIF.  CIF does a number of things by default in its storage that may not be ideal in some environments:
  • CIF will store all data records as new records even if the record previously existed in the feed the last time it was pulled.  There are a number of feeds like Alexa that are large and storing records over an over again can become expensive and unnecessary for some.
  • CIF will store raw data about each record as an IODEF text field in the database - the IODEF format contains extraneous text and fields (read: bytes/storage) that again can become expensive and unnecessary for some.
  • CIF makes use of separate tables for various data "impacts" for indexing purposes (e.g., botnet domain, phishing domain, etc.).  These additional tables help speed up the CIF query time, but there are also storage expenses and potentially complexity issues querying across multiple tables in the database.
  • There are a few other CIF storage gotchas too like storing the datafeed source as a UUID without a lookup table.  So you need to calculate the UUID for all sources to match up which one the particular UUID corresponds to.
Note: none of the above bullet-points are a bash against its design or development -- these particular points were not ideal for me and I was interested in seeing about better understanding and customizing the CIF back-end.

Fortunately within the CIF code base, everything is modular.  The main script that CIF calls in your running instance is set to regularly run from your cron:
/opt/cif/bin/cif_crontool
This script loops through running your feeds in your CIF .cfg files through:
/opt/cif/bin/cif_feedparser
This script parses (CIF::FeedParser->parse) and stores (CIF::FeedParser->process) your data feed using the CIF framework logic in:
/opt/cif/lib/CIF/FeedParser.pm
Which makes usage of its own CIF::FeedParser logic for parsing and the CIF::Archive logic for storing.

To customize how data is stored in your instance, make a copy of /opt/cif/bin/cif_feedparser to be your custom data handler (e.g.,  /opt/cif/bin/cif_feedparser_custom).  At the bottom of the file you will see the call to CIF::FeedParser->process(), add a parameter called "function" and point it to your own custom function for handling the data records that CIF parses out of your datafeeds:


Note: there is a bug in /opt/cif/lib/CIF/FeedParser.pm that impacts the use of a custom handling function.  I have reported it to Wes / CIF development - but in the meantime I have a fix for it here.

Within your custom handling function (CIF::Lite::insert_records in my case), you handle the receipt of the CIF records and configuration, and are then free to iterate over the records and normalize and store the data however you wish.  E.g.,


In my particular storage schema I used first/last seen to show a timeframe for repeating records versus creating new records and I use a lookup table for things like source and impact.

After you're happy with your back-end customization, you can modify how cif_crontool is called in your crontab to use your custom feedparser script using the "-C" option, e.g.,

You can then query your custom database directly and/or write your own CIF client tools for extracting data out of the database based on your new schema.  For example, here is my client. Hope my experience helps out anyone interested in customizing the data handling/storage functionality of their CIF instance.

Friday, May 18, 2012

A Bro script to extract artifacts from HTTP

The past few days I've been revisiting Bro (it has been awhile) for doing analysis and specific tasks when analyzing traffic dumps.  Specifically of interest was carving out artifacts of interest (i.e., executables).  Built into the base install of Bro is the "protocols/http/file-extract.bro" Bro script that allows you to redefine the "extract_file_types" variable to pull out files from HTTP sessions that match a specific MIME type.  However, I wanted a more flexible Bro script to also extract out files that match magic bytes or are to a URL with a specific file extension - as well as having whitelisting functionality so that Windows Update, etc. are not constantly being stored to disk.  I finally have something that I'm fairly happy with and wanted to share with other budding Bro users.

Side note: for me Bro has been best run against pcap files versus carving directly off the wire.  Seems that Bro is dropping too many packets for me when run on the wire - but I haven't looked into tweaking the performance.


##! HTTP Artifact extraction script by mgeide

@load ./main
@load ./file-ident
@load base/utils/files

module HTTP;

export {
  # NOTICE Type
  redef enum Notice::Type += {
    Exe_File_Capture,
  };

  # File Magic Bytes to look for
  const file_magic_bytes = /^\x50\x45\x00\x00/ &redef;
  redef HTTP::file_magic_bytes += /^\x4D\x5A/ &redef;

  # MIME types to look for
  const extract_mime_types = /application\/x-dosexec/ &redef;
  redef HTTP::extract_mime_types += /application\/x-executable/;
  redef HTTP::extract_mime_types += /application\/x-msdownload/;
  redef HTTP::extract_mime_types += /application\/exe/;
  redef HTTP::extract_mime_types += /application\/x-exe/;
  redef HTTP::extract_mime_types += /application\/dos-exe/;
  redef HTTP::extract_mime_types += /application\/x-winexe/;
  redef HTTP::extract_mime_types += /application\/msdos-windows/;
  redef HTTP::extract_mime_types += /application\/x-msdos-program/;

  # File extensions to look for
  const extract_file_extensions = /\.[eE][xX][eE]$/ &redef;
  redef HTTP::extract_file_extensions += /\.[sS][cC][rR]$/;

  # Size constraints of file to extract (TODO)
  const minimum_size = 10240 &redef; # 10K
  const maximum_size = 8388608 &redef; # 8MB  

  # URL patterns to whitelist
  const whitelist_url_patterns = /^http:\/\/[^\/]*\.windowsupdate\.com\// &redef;
  redef HTTP::whitelist_url_patterns += /^http:\/\/[^\/]*\.microsoft\.com\//;
  redef HTTP::whitelist_url_patterns += /^http:\/\/[^\/]*\.google\.com\//;

  # Information to track throughout session
  redef record Info += {
    extraction_prefix: string &optional;
    extraction_file:   file &log &optional;
    extract_file:      bool &default=F;
    extracted_size:    count &default=0;
  };
}

event http_entity_data(c: connection, is_orig: bool, length: count, data: string) &priority=-5
{
  # Ignore client communication
  if ( is_orig )
    return;

  # If in first chunk of data
  if ( c$http$first_chunk )
  {
    # Get the URL for whitelisting and extension matching
    local url = build_url_http(c$http);

    # Check for file magic byte matches
    if ( HTTP::file_magic_bytes in data )
    {
      c$http$extraction_prefix = "magic-match";
      c$http$extract_file = T;
    }
    # Check for MIME type matches
    else if ( HTTP::extract_mime_types in c$http$mime_type )
    {
      c$http$extraction_prefix = "mime-match";
      c$http$extract_file = T;
    }
    # Check for file extension matches
    else if ( HTTP::extract_file_extensions in url ) {
      c$http$extraction_prefix = "extension-match";
      c$http$extract_file = T;
    }
    # Content Disposition HTTP Header String - TODO?    

    # If a magic byte, MIME, or Ext match...        
    if ( c$http$extract_file )
    {
      # Check against whitelist
      if ( HTTP::whitelist_url_patterns in url )
      {
        c$http$extract_file = F;
      }
      else
      {
        # Open file to capture data
        local suffix = fmt("%s_%d.dat", is_orig ? "orig" : "resp", c$http_state$current_response);
        local fname = generate_extraction_filename(c$http$extraction_prefix, c, suffix);
        c$http$extraction_file = open(fname);
        enable_raw_output(c$http$extraction_file);
        c$http$extracted_size = 0;

        local message = fmt("Storing %s to %s", url, c$http$extraction_file);
        local method  = "UNKNOWN";
        if ( c$http?$method )  # I ran into some cases where method was unset  
          method = c$http$method;

        NOTICE([$note=Exe_File_Capture,
                $msg=message,
                $conn=c,
                $method=method,
                $URL=url]);
      }
    }
  } # End first chunk if

  # Do the capture when a capture file is open
  if ( c$http?$extraction_file )
  {
    print c$http$extraction_file, data;
    c$http$extracted_size += length;
  }

} # End HTTP entity data


event http_end_entity(c: connection, is_orig: bool)
{
  if ( c$http?$extraction_file )
  {
    close(c$http$extraction_file);

    ## TODO: remove extracted files that don't meet size constraints
    #if ( c$http$extracted_size > HTTP::maximum_size ||
    #     c$http$extracted_size < HTTP::minimum_size )
    #{
    #       local cmd = fmt("rm %s", c$http$extraction_file);
    #       system(cmd);
    #}
  }
}

Thursday, May 17, 2012

"Dropped Fruit" - The lowest hanging fruit


The other day I was asked to do a very brief, high-level analysis of billions of client web logs (proxy logs) to quickly identify any infections. The logs are transactional in nature, without any content: "Client made a request looking like this" and "Server made a response looking like this from the request."  Furthermore, these particular logs lacked most of the security tagging of transactions that Zscaler does, i.e., no signature or threat fingerprinting information was logged for any of the transactions.  There were a number of directions that I could have gone with this task, but given the time and resource constraints that I was under - I ultimately took a very low-hanging fruit approach that paid off and provided valuable results to the client.  This brief post is a reminder to analysts to consider the low hanging fruit, even if it's not always the most exciting threat information.

Here is the approach that I took:

  1. Filter out all transactions that lacked a HTTP "Referer" string AND (were to a domain name that did not resolve OR had a negative server response [no response or 4xx/5xx])
  2. From the results in step 1, get a count of transactions for each unique client IP, URL pair (uniq -c) and do a descending sort (sort -n -r) so that the pairs that repeated the most are at the top

What I am pulling out of the logs are the top repeating, directly requested pages that are down -- either the domain no longer resolves or the web server is no longer able to serve the particular request (e.g., no longer acting as a web server or page not found).  In other words, I'm looking for infected hosts beaconing out to downed botnet controllers (what I'm calling "dropped fruit").  Like I said above- maybe not the most exciting threat information, but it does provide a very quick sketch of the security posture of the environment.  The security community and Internet infrastructure providers (registrars and hosters) have increasingly gotten better at working together - the result is that every day, thousands of malicious domains are delisted and malicious infrastructure is taken offline.

Taking the malicious infrastructure down, does not however clean up the infected hosts -- as security analysts we should leverage this "dropped fruit" to notify customers of these infections. As an organization you should still care about these infections regardless if the C&C is down: infected hosts may have backdoors open on the system, may have security controls disabled (such as A/V or Windows Update), may be conducting malicious functionality from previous commands received (scan and sploit, spamming, DDoS, open proxy/relay), may have other malware installed as part of the botnet (rented/sold botnets), and/or may be communicating with another piece of the control infrastructure that is still live.  The non-botnet transactions in this approach also have their merits - such as uncovering misconfigurations / type-o's.

There are numerous other low hanging fruit approaches that I didn't cover here - but many of these other approaches stem around the concept of building known sets (sets of known IPs/netblocks, domains, or URLs) of interest.  Along the same lines of this "Dropped Fruit" approach, other things to consider are:

  • Transactions to sinkholes.  These can be uncovered from doing domain analysis, but there is also plenty of information that can be pulled from the web (I'm not going to list them here so as to draw attention to any security researcher sinkholes, but Google is your friend)
  • Transactions to parked IPs or pages. For example, transactions to a suspended or delisted GoDaddy/HostGator site will resolve to or redirect you to specific pages.  Also, dynamic DNS providers (such as DynDNS and No-IP) will resolve unregistered sub-domains to specific IPs. Also, parked domains may resolve to RFC1918 or bogon address space.

Here is a small snipped of my "dropped fruit" log analysis output (redacting client IPs or anything customer specific):


The number on the left is the count of the unique client IP / URL pair (the client IP has been blacked out), and the number in the last column is the server response.  Those servers that did not respond either because the domain no longer resolved or it was no longer acting as a web server had a response code of "000."  You can see a number of Conficker, Zeus, and TDSS transactions in the log snippet above.  

Be sure to look out for the "dropped fruit" in your environment.