Resources
Search Optimisation
Learning how to search quickly and effectively is vital for any analyst, and should the the first skill they learn. We recommend learning the search operators used by Google and other sites, and trying a variety of search engines.
Data Cleaning
Much of the data we deal with is in text files (including CSVs) and spreadsheets, and often you'll need to clean and structure your data. The more you learn about processing data in Excel, the more time you'll save in the long run. You can also change some options in Word and Excel to make yourself more efficient.
If you often need to do the same cleaning steps, then a VBA macro will save you a lot of time. This also means that your macro code becomes in effect an audit trail of your cleaning. However, not all workplaces will allow macros, and they can take a while to set up; so it depends on the situation. It's best to learn all the tools available so you can know which one is the best for the job at hand.
Sometimes your data is so disorganised that it's not even ready to put into Excel. Usually, this is because you have a problem with your delimiters, and Excel can't tell where one field or row ends and the next begins. You may have delimiters in the middle of values, missing, or used inconsistently.
If you have this problem, you can fix this in Excel, but it's usually quicker and easier to fix this using Find & Replace to fix the delimiters in Notepad, Notepad++ or Word, and then do any further cleaning in Excel. If this happens a lot in your role, you can also use Macros in Word or Notepad++, too.
These should be able to get you through most data quality issues, but if you have a very large data set, or very serious data quality issues, you'll need something a bit more complicated.
REGEX
Regular Expressions (Regex) are one of the best tools for processing large sets of 'messy' data as well as running complex searches. They are supported in many different applications - including Microsoft Word (in Find and Replace, if you turn 'Wildcards' on) and Excel (in VBA, as an optional reference) and Notepad++. Regex is confusing to look at, and it is slightly different in each application; so you'll want to make sure you learn the right codes for what you're using. However it's worth learning because it is very powerful. It's great for transforming large sets of data into a single, structured and consistent format.
For example, consider we have a list of thousands of phone numbers, all entered in a separate format:
01-123-45678, 2 12 345678, 312345678, 412345678, 5-1 2345678 ...(and so on).
We could use a whole lot of complex macro code to make them all the same - or sort them into different 'categories', writing seperate formulas to fix each different scenario. Or, using Regex in Python, we can quickly make all these phone numbers conform to the same format, like so:
import re
raw_txt = [“01-123-45678”, “2 12 345678”, “312345678”, “412345678”, “5-1 2345678”]
for x in raw_txt:
x = re.sub(“^[0]?(\d)[- ]?(\d{1,8})[- ]?(\d{1,8})?$", "0\g<1>-\g<2>\g<3>”, x)
print(x)
It looks confusing, but it'll clean our list into:
01-12345678, 02-12345678, 03-12345678, 04-12345678, 05-12345678 …(and so on).
You can imagine how useful this would be if you had a list of thousands of phone numbers! So, as you can see, it's a great alternative to doing things the slow way - even if it's confusing. To get started, try these links:
-
Test Regex using Regex101 (Make sure to change the the correct 'Flavor')
Website Analysis
For information about a given website, the starting point should be a WHOIS search, though modern records typically have limited information - so you should seek out historical ones if you can. You may wish to conduct a TRACERT as well. It's also worthwhile conducting Image and Video Analysis (see below) on any content, and searching text content to see if it has appeared on other sites.
-
Historical WhoIS via various services such as WhoISRequest (Free, but partial data), DRS (paid), WhoIS History API (paid), or Domain Tools (paid), or freely via WayBack Machine (see this guide from NixIntel)
-
TRACERT online using traceroute-online.com
-
Website Popularity using Similarweb Analytics
-
Geo-IP lookup using Check-Host.net
People Analysis
What method you use to search people will depend on what you have access to normally, what country you're based in, and the task at hand. Generally, a standard web search should always be one of your first checks. Afterwards, we recommend searching usernames, emails, phone numbers and any other identifiers. You can do this via standard searches or on specialised people search engines. Then you can build out from there depending on what you find - getting more specific, delving into online profiles, or utilising other techniques.
On specialised people searches, there are lots of options available, but the good ones are typically paid services. There are some free ones available, but many will show you fake results in order to bait you into signing up, or only have good coverage in one or two countries.
In some countries there are public records which can be a good and reliable source of information, but because these are locally specific, we have not listed them here. You'll want to find out the best ones for your and add them to your own process.
Some general starting points are:
-
Databreach searches via haveibeenpwned or dehashed (requires sign-up), which can tell you where you might find an account.
Please note that we do not advise using data breaches as part of OSINT analysis. -
People search via Radaris
Social Media
How you conduct social media will depend on whether you are interested in a person, a group, a place, or term, a typology, or something else.
If conducting social media analysis, we recommend using a separate, dedicated device, with a VPN, a sandboxed enhanced-privacy browser, and a covert account. You may also wish to use TOR. This ensures that your investigative activity is not connected to any other activity.
Social media analysis is a complex topic, but here are some useful starting points:
-
OPSEC Tradecraft guide from ScopeNow
-
Username generator from JimPix
-
Twitter search using Waybackmachine or analytics using foller.me
-
Facebook Posts using whopostedwhat or Sowsearch or the Plessas Facebook Matrix
-
Instagram full-res profile photos using save-free or inflact or instadp.io
Image and Video Analysis
There's lots of great tools online for image and video analysis. However your first step should always be checking the file metadata. 'Exif' metadata can be very useful; it can contain the date and time the media was created, what device created it, and sometimes even the GPS coordinates at the time. Because this can be so revealing, many people clear the exif data, and some sites will automatically 'strip' exif on any uploaded content to protect user's privacy. Nonetheless, it should always be a part of your process.
Reverse image and video searches are also hugely important. You can find out whether the media is unique (and potentially new) or whether it's been around for a long time. If you're lucky, the media is neither totally unique, nor widespread, and the reverse search can help you find links you wouldn't find otherwise. It can also be useful to find higher quality versions of the same media. Sometimes you may be able to find the original source, or a copy which has metadata intact. So, this should always be a part of your process as well.
Before getting into geolocation of images or video, you should start with metadata and reverse image searches as detailed above. Once that's done, you need to work through a process of gradually narrowing the candidate locations down. Geolocation is an art in itself, and too big a topic to cover here, but here's a few good starting points.
Academic Papers
Your employer likely has access to some academic databases already, but if not, you can access huge volumes of 'open' research papers already.
Note that we do not recommend sci-hub as using the site may breach copyright law.
Public Registers and Data
Many countries have public registers which can contain useful information. Note that many of these registers are 'scraped' by third-party providers, some of which will pollute your search results without adding any value. Some providers will scrape registers, process the data, and assist you with linking entities - however the utility of those products will depend on your work.
Australia
New Zealand
-
MBIE Companies Register
-
MBIE Full Register List(click 'All' in the top-right)
-
DIA list of AML/CFT Reporting Entities
-
Maritime NZ Ship Register
-
Professional Registers for:
-
PSPPI Holders (Security Guards, Private Investigators, etc)
Reading Recommendations
Last, but not least, we would like to share some books that we have found interesting, or have prompted some useful discussions.
We don't necessarily agree with everything they say, but, they've certainly been worth reading.