Integration to read text and URLs from an image

Hi,

There is any integration like Swimlane Utilities that lets get or process an image to get text and URL detection from an image loaded to a record or on an external URL?

The idea is for an email phishing triage use case. Does this functionality exist?

Well, not so much participation right now but I made some research and successfully gets a python script to read the text in an image file. then detect URLs and if the URL is a bit.ly shorted link get the expanded URL too via bit.ly API.

import cv2
import pytesseract
import re
import requests

# bit.ly API token
TOKEN = "XXXXXXXXXXXXXXXXXXXXXXXXX"

def FindURL(string):
    # findall() has been used
    # with valid conditions for urls in string
    regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
    url = re.findall(regex,string)
    return [x[0] for x in url]

def expandURL_bilty(url=''):
    #Remove https from links
    #Remove http from links
    bitly_url = url.split('//') if '//' in url else url
    
    # construct the request headers with authorization
    headers = {"Authorization": f"Bearer {TOKEN}", 'Content-Type': 'application/json'}
    json_link = {"bitlink_id": bitly_url }

    # get the group UID associated with our account
    response = requests.post("https://api-ssl.bitly.com/v4/expand", json=json_link, headers=headers)
    link = 'Fail'
    if response.status_code == 200:
        # if response is OK, get the shortened URL
        link = response.json().get("long_url")
    
    return link

img = cv2.imread('test2.jpg')

text = pytesseract.image_to_string(img)
text_oneline = text.replace('\n', ' ')
print(text_oneline)

urls =  FindURL(text_oneline)
e_urls = []
for url in urls:
    e_urls.append(url)
    if 'bit.ly' in url :
        e_url = expandURL_bilty(url)
        e_urls.append(e_url)

print('URLs : ',e_urls)

Some example of this code could download this URL image

or any other. just search on google “phishing sms image url bilty” download it an rename to test2.jpg or change the code to proper file name. Any way, this example gets this output:

Text Message Today 10:25 am  Your :Winner ID: ~ won 6.500.000M Pounds in Donating Pro, To claim goto hitp://bit.ly/PaYY, click CLAIM enter your Winning Ref#: AG7414DQ 

URLs :  ['bit.ly/PaYY', 'http://www.onedsk.com/reward/']

Now, someone has an idea about, how to integrate this code to a Swimlane bundle, integration, widget, and reading a loaded image in a swimlane record? :thinking:

Thanks for the update!

Thanks for sharing, I’ll take a stab at it :slight_smile:

2 Likes