url
Contents
url¶
Event fields used to define/normalize metadata about a URL/URI. There is a lot of ambiguity from the community on the difference URL vs URI. Granted, URL would normally include the domain, port (if applicable), user, password, query, fragment, and URI.
However, there are many scenarios from log sources where one could not distinguish whether it was the full URL or just the URI.
URL data can be seen in various log sources as defined in http.md as well as other applications such as SIP. URLs, especially in HTTP, have a best practice implementation however it is not necessary to adhere for connections/data to be established.
Attributes¶
Name |
Type |
Description |
Sample Value |
---|---|---|---|
url_category |
string |
The defined grouping of a URL (or could be just based on the domain in the URL) related to what it is (ie: adult, news, advertising, parked domains, etc) |
|
url_extension |
string |
The extension (.dll, .php, zip, .msi, .txt, etc) without the “.” |
|
url_fragment |
string |
The portion of the URL after the last “#”, this is defined in https://tools.ietf.org/html/rfc3986#section-3.5. This is also referred to as the “hash” in some implementations. This value does not always exist |
```````` |
url_hostname |
string |
The domain/host/hostname of the URL. This could be an IP address or any variation of a value but is more than likely a domain/hostname |
|
url_original |
string |
The entirety of the URL combined together and or the URL in the truest form from the log source. Some log sources will already parse out portions of the URL into their respective fields. Other logs will even parse out the portions of the URL into their respective field but also include the “original” URL. Always try to include this field, because HTTP/URLs never truly have to conform to any RFC/implementation and thus any parsing/logging implementation could have any number of assumptions/mistakes - therefore it is best to keep a original value |
|
url_path |
string |
Everything beginning with and after the first “/”. This portion should usually exist in the log source / URL… Even if the path is just “/”. Also, even if the query or fragment have not been parsed yet then you still include them in this value |
```````` |
url_port |
integer |
The port in the URL. This is not to be confused with destination.md. In your ETL pipeline you should check if the value derived from the URL is actually an integer (unless properly verified in the data source)… because as mentioned throughout, URLs can be manipulated/mis-implemented in many different ways |
|
url_query_names |
string |
The keys/fields derived from the query. Due to the limitless variations of implementations of a URL, providing a nested object of key/values is not recommened. Whether an attacker is injecting data into a URL or an incorrect implementaiton or malicious implementation - it’s possible you could have keys/fields of values of anything you can imagine (ie: “%%)%#Nf…$2f>hr…n fa.fa s\”\jhrwq”: “somevalue” |
|
url_query_values |
string |
The values derived from the query. Due to the limitless variations of implementations of a URL, providing a nested object of key/values is not recommened. Whether an attacker is injecting data into a URL or an incorrect implementaiton or malicious implementation - it’s possible you could have keys/fields of values of anything you can imagine (ie: “%%)%#Nf…$2f>hr…n fa.fa s\”\jhrwq”: “somevalue” |
|
url_scheme |
string |
Defines the network location (ie: smtp, ftp, smb, ldap, etc). This portion may not exist in many log sources. The is usually the value that comes before the first “://”. This is also referred to as URN/origin |
|
url_user_name |
string |
The username defined in the URL. This is meant to be distinguished from something such as the value in the Authorization header in an HTTP request (or even the Proxy Authentication HTTP header). This value should be copied to any.md |
|
url_user_password |
string |
The password defined in the URL. This is meant to be distinguished from something such as the value in the Authorization header in an HTTP request (or even the Proxy Authentication HTTP header) |
|