Table of Content
Forge.AI is a cloud-based data and knowledge platform that powers the analytical, model driven organization. We create real time structured knowledge and data feeds out of the chaotic ocean of unstructured data that flows across the globe and throughout your organization. Our customers can explore, discover, monitor and model the knowledge and information captured in that unstructured data in fundamentally new and powerful ways enabling the rapid delivery of high-impact, data driven models that create a competitive advantage.
Forge.AI amplifies the quantity of data science your organization can conduct. We eliminate the time, expense and technology challenges associated with trying to create and sustain a data pipeline that incorporates the at scale transformation of unstructured data into algorithmically relevant structured data. We fundamentally accelerate the speed at which your data scientist can, explore hypothesis, iterate on and deploy models that deliver business value.
Forge.AI creates a new paradigm in terms of the quality and sophistication of data science your organization can conduct. Historically, data science conducted with unstructured data has been “shallow,” meaning model inputs tend to be simple (e.g. word counts) and model outputs paint with a broad brush (e.g. traditional sentiment analysis). Forge.AI brings unparalleled fidelity to the identification, extraction, resolution and semantic framing of the interaction between the entities, events and relationships expressed in unstructured data. This fidelity enables a new level of sophistication and time series modeling and analysis within your organization. See more. See faster. Predict better.
Forge.AI collects publicly available information in real time and transforms it into a resource that: (i) is suitable for a wide assortment of machine learning and artificial intelligence applications, enhancing data science environments; (ii) improves automated decision making and reasoning processes and (iii) enhances internal knowledge bases. The data resources created by Forge.AI are made available to our customers in either a streaming format or addressable through a SQL enabled cloud data warehouse (Snowflake).
Our customers simply “plug into” the Forge.AI data environment and immediately, transformed and enriched data is available to our customers.
To give a sense as to what is happening “under the hood”: a brief abstract of the Forge.AI processing platform is detailed below:

Collection: Our collection environment operates in real time 7x24x365. Source information is collected from tens of thousands of national, local, and industry specific news sites, organizations web sites, different US and international sites (e.g. the SEC), social media, and private feeds and / or feeds that you provide directly to us. Specially designed algorithms continually learn and optimize the process by which we collect source information. We abide by all politeness clause and other requirements in our collection process. If there is ever a source of data we are not collecting that is relevant to you we will work with you and that data source to incorporate into our ecosystem.
Perceive: Forge.AI natural language understanding environment extracts all relevant entities, events and relationship. In addition, it extracts a host of meta information and generates AI informed “calculated data” derived from source documents directly and/or in conjunction with our knowledge base.
Reason and Enrich: Forge.AI’s semantic, temporal and probabilistic knowledge base works in concert with our natural language understanding technologies to resolve all relevant data. Furthermore, the knowledge base enables contextual awareness for enhanced analytics such as dependency analysis etc.
The Forge.AI knowledge base is where knowledge that is extracted and learned by Forge.AI in its processing of information persists. The knowledge base is a dynamic and ever-changing environment that learns and grows every day as the world changes.
The knowledge base is also where Forge.AI maintains the various ontologies we support. An overview of the primary ontologies is listed below. It is worth noting that Forge.AI adds to the ontologies we support both organically and at the request of customers:
- Domain Specific Ontologies
- Corporate Structure
- People and Corporate Relationships
- Business Concepts
- Financial Concepts
- Marketing Concepts
- Risk (Operational Risk, Credit Risk, Market Risk)
- Science (Physics, Chemistry, Biology, etc.)
- Military and National Security
- Events
- Custom Ontologies / Ontologies as Requested
Data Output and Syndication: Data is syndicated to our customer environment in one of two ways.
- Option One: Customers connect with our real time cloud-based data warehouse. This data document which follows, describes how the data is managed in that cloud data warehouse. It is intended to orient the customer to nature of the data managed in the data warehouse, its structure, and how to query the data and integrate it into your operations.
- Option Two: Customers connect with our real-time streaming data environment. This environment is conceptually similar in structure our cloud-based data warehouse; however the representation is in XML or JSON and the delivery mechanism uses HTTPS 2.0 Push. Separate documentation is available which describes this environment.
The format of the data that is syndicated to our customers is extensively described below. There are three “classes” of data extracted and generated by Forge.AI:
- Direct data that corresponds directly with a published document (e.g. news story, SEC filing, tweet, etc.);
- Calculated information that is derived from a published document, such as an automated LDA topic vectors representing the source information;
- Aggregate information derived by looking at a large collection of the corpus of processed data.
Connecting with the Forge.AI environment is simple, and an abstract is provided below. You simply choose to connect to our cloud-based data warehouse, or you choose to connect to our push API.

Schema

Diagram
Documents
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
The UUID for this document. |
| parentUuid |
VARCHAR(48) |
✔ |
If the document has a parent document, this field will represent the id of the parent document. |
| url |
VARCHAR(2048) |
✖ |
The resolved URL for the document. This may be the same as the original URL if there was no redirection. |
| originalUrl |
VARCHAR(2048) |
✔ |
The source URL before any redirection. |
| docType |
STRING |
✔ |
The docType is used to determine how the document is published. Current types = article secfiling |
| title |
STRING |
✔ |
The title of the document (potentially translated into English.) |
| originalTitle |
STRING |
✔ |
The original title of the document (if not English.) |
| docDateTime |
TIMESTAMP_NTZ |
✔ |
The later of the document publish date and last update. |
| collectDateTime |
TIMESTAMP_NTZ |
✔ |
The time the document was collected and processed. |
| summary |
STRING |
✔ |
Summary of the document text. |
Notes:
docDateTime is NULL if no document date could be identified for the document.
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- parentUuid:
- af802d69-1d95-44f2-9096-1b15c1d6ad4d
- url:
- http://www.tampabay.com/nation-world/two-nuns-stole-500000-for-trips-to-las-vegas-but-the-church-doesnt-want-them-prosecuted-20181210/
- orginalUrl:
- NULL
- docType:
- article
- title:
- Two nuns stole $500,000 for trips to Las Vegas. But the church doesn’t want them prosecuted.
- originalTitle:
- NULL
- docDatetime:
- 2018-12-10 13:50:20.000
- collectDatetime:
- 2019-06-08 00:37:46.000
- summary:
- Around the same time when Kreuper announced she was retiring earlier this year, a family at the school asked for a copy of an old check they had written to St. James. That was among the first clues that would unravel a vast fraud that was allegedly cond…
DocumentSource
The channel the document was collected through.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Documents(docId) |
| sourceId |
INTEGER |
✖ |
The id for the source channel. This is an internal value. |
| parentId |
INTEGER |
✖ |
Logical grouping id. |
| name |
STRING |
✖ |
The name of the channel the document was collected from. Example include “The Wall Street Journal”, and “Bloomberg”. |
| type |
STRING |
✔ |
“RSS feed”, “crawl”, etc. |
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- sourceId:
- 595
- parentId:
- 2
- name:
- St. Petersburg Times
- type:
- SITE
Relevant HTTP header information acquired during document collection.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Documents(docId) |
| name |
STRING |
✖ |
|
| value |
STRING |
✔ |
|
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- name:
- DESCRIPTION
- value:
- The two women, reportedly best friends, are accused of pilfering funds on trips and casino visits.
SimilarDocuments
A collection of references to documents that are closely identical to this document.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Documents(docId) |
| similarDocId |
VARCHAR(48) |
✖ |
The UUID of the similar document. |
| score |
FLOAT |
✔ |
A score from 0.0-1.0 reflecting the similarity of the documents. A score of 1.0 indicates the documents are identical. |
Notes:
similarDocId may not be present in the Documents table.
Example:
- docid:
- d445372b-f880-496f-86c1-cc496a39874c
- similarDocId:
- 395ad38f-3239-4623-9c34-a6514f5a07c4
- score:
- 0.949999988
Sections
The collection of sections constituting the document.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Documents(docId) |
| sectionId |
INTEGER |
✖ |
The document scope ID of the section. |
| parentId |
INTEGER |
✔ |
The parent section ID. Where appropriate this value is used to support representing an extracted document as a hierarchical collection of sections. |
| type |
VARCHAR(1) |
✔ |
An identifier for the type of information represented in the section. The possible types are: Text (T) |
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- sectionId:
- 1
- parentId:
- 0
- type:
- T
Any metadata associated with a section represented as name-value pairs.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Sections(docId, sectionId) |
| sectionId |
INTEGER |
✖ |
Foreign key to Sections(docId, sectionId) |
| name |
STRING |
✖ |
|
| value |
STRING |
✔ |
|
Example:
- docid:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- sectionId:
- 1
- name:
- date
- value:
- Published December 10 2018
SourceTexts
If the section is a text (T) section then this table will contain the source text.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Sections(docId, sectionId) |
| sectionId |
INTEGER |
✖ |
Foreign key to Sections(docId, sectionId) |
| title |
STRING |
✔ |
The optional title for the section (translated if necessary) |
| text |
STRING |
✔ |
The text for the section (translated if necessary) |
| originalTitle |
STRING |
✔ |
Original text for the title (if not English.) |
| originalText |
STRING |
✔ |
Original text for the section (if not English.) |
| summary |
STRING |
✔ |
Summary of the section text. If the section text is less than 10 sentences then there will be no summary. |
Notes:
title and text may be NULL because of licensing constraints.
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- sectionId:
- 1
- title:
- Two nuns stole $500,000 for trips to Las Vegas.
- text:
- But the church doesn’t want them prosecuted. An old check allegedly exposed decades of lies…
- originalTitle:
- NULL
- originalText:
- NULL
- summary:
- Around the same time when Kreuper announced she was retiring earlier this year, a family at the school asked for a copy of an old check they had written to St. James. According to what Monsignor Michael Meyers explained to parents last Monday at the meeting, the church launched an independent financial investigation after the oddity regarding the old check was discovered. She allegedly endorsed the checks with a stamp saying, “St. James Convent,” not “St. James School.” The two nuns then tapped the funds for their personal use, Meyers said.
Sentences
The set of extracted sentences from the text section.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Sections(docId, sectionId) |
| sectionId |
INTEGER |
✖ |
Foreign key to Sections(docId, sectionId) |
| sentenceId |
INTEGER |
✖ |
The document scope unique ID for the sentence. |
| text |
STRING |
✔ |
The sentence text. |
| startOffset |
INTEGER |
✔ |
Start offset in bytes into the section text for the sentence. |
| endOffset |
INTEGER |
✔ |
End offset in bytes into the section text for this sentence. |
Notes:
text may be NULL because of licensing constraints.
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- sectionId:
- 1
- text:
- Two nuns stole $500,000 for trips to Las Vegas.
- sentenceId:
- 0
- startOffset:
- 0
- endOffset:
- 47
Entities
The collection of entities discovered in the source document. An entity is named object or domain concept.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Documents(docId) |
| entityId |
INTEGER |
✖ |
The document scope unique id for the entity. |
| type |
VARCHAR(1) |
✖ |
An attribute used to identify the entity type as a simple (S) or complex (C) entity. A complex entity is composed of a set of other entities (e.g. CEO Sally Smith is composed of a title and a person element) while a simple entity consists of just one entity type. |
| label |
STRING |
✖ |
The extracted text associated with the entity. |
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- entityId:
- 5
- type:
- S
- label:
- Las Vegas
EntityComponents
The components are the constituent entities that collectively constitute the complex entity.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Entities(docId, entityId) |
| entityId |
INTEGER |
✖ |
Foreign key to Entities(docId, entityId) |
| componentId |
INTEGER |
✖ |
entity id into Entities of the component |
EntityAssertions
The extracted entities are disambiguated/resolved using the ForgeAI semantic knowledge base.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Entities(docId, entityId) |
| entityId |
INTEGER |
✖ |
Foreign key to Entities(docId, entityId) |
| assertedClass |
VARCHAR(2048) |
✖ |
The URI for the asserted semantic type. |
| predicate |
STRING |
✔ |
The predicate for the assertion such as “isA”, “ownerOf” etc. |
| confidence |
FLOAT |
✔ |
A value from 0.0-1.0 of the confidence in the assertion. |
Example:
EntityProperties
Entity properties intrinsic to the entity. Examples include “latitude=42.0016”, “CIK=320193”, etc.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Entities(docId, entityId) |
| entityId |
INTEGER |
✖ |
Foreign key to Entities(docId, entityId) |
| name |
STRING |
✖ |
|
| value |
STRING |
✔ |
|
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- entityId:
- 5
- name:
- latitude
- value:
- 36.17497
EntityLocation
The entity label’s location into the document.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Entities(docId, entityId) |
| entityId |
INTEGER |
✖ |
Foreign key to Entities(docId, entityId) |
| sentenceId |
INTEGER |
✖ |
The ID of the document sentence that the entity is located in. |
| startOffset |
INTEGER |
✖ |
The offset in bytes into the sentence to the start of the entity label. |
| endOffset |
INTEGER |
✖ |
The offset in bytes into the sentence to the end of the entity label. |
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- entityId:
- 5
- sentenceId:
- 0
- startOffset:
- 37
- endOffset:
- 46
DocumentTopics
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Documents(docId) |
| name |
STRING |
✖ |
The topic name (e.g. “finance”, “management”, “physics”, “sports”, etc.) |
| parent |
STRING |
✔ |
If the topic represents a hierarchy, then the parent is the name of the direct ancestor topic. |
| score |
FLOAT |
✔ |
A measure from 0.0-1.0 indicating the topic strength. |
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- name:
- fraud
- parent:
- NULL
- score:
- 0.979394794
DocumentTopicWords
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Documents(docId) |
| name |
STRING |
✖ |
Foreign key to DocumentTopics(name, parent) |
| parent |
STRING |
✔ |
Foreign key to DocumentTopics(name, parent) |
| word |
STRING |
✖ |
|
| value |
FLOAT |
✔ |
|
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- name:
- fraud
- parent:
- NULL
- word:
- investigation
- value:
- 0.00999999978
DocumentTopicModelVectors
Collection of document score topic model vectors that can be used for assessing document similarity, change detection, etc.
Comparisons should not be made between topic vectors generated using different topic models. Applications may demand multiple topic models drawn from different corpora. The set of topic vectors allows for associating multiple topic vector calculations with a text section.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Documents(docId) |
| modelName |
STRING |
✔ |
|
| modelVersion |
STRING |
✔ |
|
| dimensionIndex |
INTEGER |
✖ |
Topic vector index (0…n-1) |
| dimensionValue |
FLOAT |
✖ |
The contribution (0.0-1.0) that this dimension has to the document. |
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- modelName:
- general
- modelVersion:
- VanillaLda.model-identifier-979b20fdc5691083320d4a61c515e64e0b53ecf9-a6728cf46f462fc4e38b53e8420a386dbe5d75ad
- dimensionIndex:
- 3
- dimensionValue:
- 0.410293996
TopicModelVectors
Section level topic model vector.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Sections(docId, sectionId) |
| sectionId |
INTEGER |
✖ |
Foreign key to Sections(docId, sectionId) |
| modelName |
STRING |
✔ |
|
| modelVersion |
STRING |
✔ |
|
| dimensionIndex |
INTEGER |
✖ |
Topic vector index (0…n-1) |
| dimensionValue |
FLOAT |
✖ |
The contribution (0.0-1.0) that this dimension has to the section. |
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- sectionId:
- 1
- modelName:
- general
- modelVersion:
- VanillaLda.model-identifier-979b20fdc5691083320d4a61c515e64e0b53ecf9-a6728cf46f462fc4e38b53e8420a386dbe5d75ad
- dimensionIndex:
- 13
- dimensionValue:
- 0.0602470003
Relationships
A semantic association based on the ForgeAI relationship ontology between two entities in the document.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Documents(docId) |
| relationshipId |
INTEGER |
✖ |
The document scope unique id for the relationship. |
| subjectLabel |
STRING |
✖ |
The entity that is acting as the source in the directed relationship. In an undirected relationship, the source and object entities are interchangeable. |
| subjectEntityId |
INTEGER |
✔ |
The entity ID for this entity if found in the entities section. |
| predicateLabel |
STRING |
✖ |
The extracted entity that is acting as the predicate joining the subject and object entities. |
| objectLabel |
STRING |
✖ |
The entity that is acting as the object or destination in the directed relationship. In an undirected relationship the source and object entities are interchangeable. |
| objectEntityId |
INTEGER |
✔ |
The entity ID for this entity if found in the entities section. |
| confidence |
FLOAT |
✖ |
The calculated confidence of the extracted relationship. |
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- relationshipId:
- 1
- subjectLabel:
- Mary Margaret Kreuper
- subjectEntityId:
- NULL
- predicateLabel:
- isA
- objectLabel:
- Sister
- objectEntityId:
- NULL
- confidence:
- 0.96526194
RelationshipPredicateAssertions
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Relationships(docId, relationshipId) |
| relationshipId |
INTEGER |
✖ |
Foreign key to Relationships(docId, relationshipId) |
| assertedClass |
VARCHAR(2048) |
✖ |
The URI for the asserted semantic type. |
| predicate |
STRING |
✔ |
The predicate for the assertion such as “isA”, “ownerOf” etc. |
| confidence |
FLOAT |
✔ |
A value from 0.0-1.0 of the confidence in the assertion. |
RelationshipSupportingSentences
Information identifying the locations in the document supporting the relationship assertion.
| NAME |
TYPE |
NULLABLE? |
COMMENT |
| docId |
VARCHAR(48) |
✖ |
Foreign key to Relationships(docId, relationshipId) |
| relationshipId |
INTEGER |
✖ |
Foreign key to Relationships(docId, relationshipId) |
| sentenceId |
INTEGER |
✖ |
The document scope ID for the supporting sentence. |
Example:
- docId:
- 03afceb2-8369-44b0-9909-3d50bb51e385
- relationshipId:
- 1
- sentenceId:
- 4
Sample Queries
Number of documents collected by sites over the past week.
SELECT ANY_VALUE(name) AS name,
REGEXP_REPLACE(URL, 'https?:\/\/(www\.)?([^\/]+).*', '\\2') AS site,
COUNT(*) AS cnt
FROM Articles.Documents
JOIN Articles.DocumentSource USING (docId)
WHERE url LIKE 'http%'
AND collectDatetime >= DATEADD(DAY, -7, CURRENT_DATE())
GROUP BY site
ORDER BY cnt DESC
Documents mentioning a company of interest.
Knowing the company’s URI
SELECT DISTINCT docId, url, title
FROM Articles.Documents
JOIN Articles.EntityAssertions USING (docId)
WHERE assertedClass = 'http://forge.ai/company/apple_inc'
Using the company’s CIK
SELECT DISTINCT docId, url, title
FROM Articles.Documents
JOIN Articles.EntityProperties USING (docId)
WHERE name = 'cik' AND value = '320193'
Documents mentioning a company of interest within the past 7 days.
SELECT DISTINCT docId, url, title
FROM Articles.Documents
JOIN Articles.EntityAssertions USING (docId)
WHERE assertedClass = 'http://forge.ai/company/apple_inc'
AND docDatetime >= DATEADD(DAY, -7, CURRENT_DATE())
Number of documents per day mentioning a company of interest within the past 7 days.
SELECT TO_DATE(docDateTime) AS date, COUNT(*) AS cnt
FROM Articles.Documents
JOIN Articles.EntityAssertions USING (docId)
WHERE assertedClass = 'http://forge.ai/company/apple_inc'
AND docDatetime >= DATEADD(DAY, -7, CURRENT_DATE())
GROUP BY 1
Documents mentioning a company of interest from a specific source.
SELECT DISTINCT docId, url, title
FROM Articles.Documents
JOIN Articles.DocumentSource USING (docId)
JOIN Articles.EntityAssertions USING (docId)
WHERE assertedClass = 'http://forge.ai/company/apple_inc'
AND name = 'CNN'
Documents mentioning 2 companies of interest.
SELECT DISTINCT docId, url, title
FROM Articles.Documents
JOIN Articles.EntityAssertions a1 USING (docId)
JOIN Articles.EntityAssertions a2 USING (docId)
WHERE a1.assertedClass = 'http://forge.ai/company/apple_inc'
AND a2.assertedClass = 'http://forge.ai/company/alphabet_inc.'
Or
SELECT DISTINCT d.docId, url, tile
FROM Articles.Documents d
JOIN Articles.EntityAssertions a1 USING (docId)
JOIN Articles.Entities e2 USING (docId)
JOIN Articles.EntityAssertions a2 ON (a2.docId = e2.docId AND a2.entityId = e2.entityId)
WHERE a1.assertedClass = 'http://forge.ai/company/apple_inc'
AND e2.label = 'WallaMe'
AND a2.assertedClass = 'http://forge.ai/entity#Organization'
Documents classified with a topic of interest.
SELECT DISTINCT docId, url
FROM Articles.Documents
JOIN Articles.DocumentTopics USING (docId)
WHERE name = 'bankruptcy';
Relationships for a person of interest.
SELECT LOWER(predicateLabel), LOWER(objectLabel), MAX(confidence) AS confidence, COUNT(*) AS COUNT
FROM Articles.Relationships
WHERE LOWER(subjectLabel) = 'tim cook'
GROUP BY 1,2
ORDER BY 4 DESC
Documents that cluster (in one dimension of the LDA space) around a company of interest.
WITH LDA AS (
SELECT *
FROM Articles.DocumentTopicModelVectors
WHERE modelVersion = 'VanillaLda.model-identifier-979b20fdc5691083320d4a61c515e64e0b53ecf9-a6728cf46f462fc4e38b53e8420a386dbe5d75ad'
), Apple AS (
SELECT dimensionindex, AVG(dimensionvalue) AS dimensionvalue
FROM LDA
WHERE EXISTS (SELECT 1
FROM Articles.EntityAssertions a
WHERE a.docId = LDA.docID
AND assertedClass = 'http://forge.ai/company/apple_inc')
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1)
SELECT docid, url, title
FROM LDA
JOIN Apple ON (Apple.dimensionindex = LDA.dimensionindex AND ABS(Apple.dimensionvalue - LDA.dimensionvalue) < 0.005)
JOIN Articles.Documents d USING (docId)
WHERE NOT EXISTS (SELECT 1
FROM Articles.EntityAssertions a
WHERE a.docId = d.docId
AND assertedClass = 'http://forge.ai/company/apple_inc')
Documents that cluster (in the LDA vector space) around a company of interest.
WITH LDA AS (
SELECT *
FROM Articles.DocumentTopicModelVectors
WHERE modelVersion = 'VanillaLda.model-identifier-979b20fdc5691083320d4a61c515e64e0b53ecf9-a6728cf46f462fc4e38b53e8420a386dbe5d75ad'
), Apple AS (
SELECT dimensionindex, AVG(dimensionvalue) AS dimensionvalue
FROM LDA
WHERE EXISTS (SELECT 1
FROM Articles.EntityAssertions a
WHERE a.docId = LDA.docId
AND assertedClass = 'http://forge.ai/company/apple_inc')
GROUP BY 1)
SELECT docid, url, title, SUM(SQUARE(NVL(lda.dimensionvalue, 0) - NVL(Apple.dimensionvalue, 0))) AS d
FROM LDA
FULL OUTER JOIN Apple USING (dimensionindex)
WHERE NOT EXISTS (SELECT 1
FROM Articles.EntityAssertions a
WHERE a.docId = LDA.docId
AND assertedClass = 'http://forge.ai/company/apple_inc')
GROUP BY 1
HAVING d < 0.01;
List all 8-K filings for Apple (CIK=3571) in 2018Q3
WITH meta AS (
SELECT docid,
MAX(CASE WHEN name = 'forgeai:sec-filing-type' AND value LIKE '8-%' THEN SUBSTR(value, 1, 3)
WHEN name = 'forgeai:sec-filing-type' THEN SUBSTR(value, 1, 4) END) AS filing,
MAX(CASE WHEN name = 'forgeai:sec-filing-year' THEN value::INT END) AS year,
MAX(CASE WHEN name = 'forgeai:sec-filing-qtr' THEN value::INT END) AS quarter,
MAX(CASE WHEN name = 'forgeai:company-cik' THEN value::INT END) AS cik
FROM SEC.DocumentMetadata
GROUP BY 1
)
SELECT docid, title
FROM SEC.Documents
JOIN meta USING (docid)
WHERE filing = '8-K'
AND year = 2018
AND quarter = 3
AND cik = 3571;
List all organizations mentioned in a Risk Factors section (1A) of a 10-K filing for Apple (CIK=3571)
SELECT smd.docid, url, label
FROM SEC.SectionMetadata smd
JOIN SEC.Entities USING (docid)
JOIN SEC.EntityAssertions USING (docid, entityid)
JOIN SEC.EntityLocations USING (docid, entityid)
JOIN SEC.Sentences USING (docid, sectionid, sentenceid)
JOIN SEC.Documents USING (docid)
WHERE name = 'forgeai:sec-section-type'
AND value = '1A'
AND EXISTS (SELECT 1
FROM SEC.DocumentMetadata d
WHERE d.docid = smd.docid
AND name = 'forgeai:company-cik'
AND value::INT = 3571)
AND assertedClass = 'http://forge.ai/entity#Organization';
Useful links
Table of Content
Forge.AI Overview
Forge.AI is a cloud-based data and knowledge platform that powers the analytical, model driven organization. We create real time structured knowledge and data feeds out of the chaotic ocean of unstructured data that flows across the globe and throughout your organization. Our customers can explore, discover, monitor and model the knowledge and information captured in that unstructured data in fundamentally new and powerful ways enabling the rapid delivery of high-impact, data driven models that create a competitive advantage.
Forge.AI amplifies the quantity of data science your organization can conduct. We eliminate the time, expense and technology challenges associated with trying to create and sustain a data pipeline that incorporates the at scale transformation of unstructured data into algorithmically relevant structured data. We fundamentally accelerate the speed at which your data scientist can, explore hypothesis, iterate on and deploy models that deliver business value.
Forge.AI creates a new paradigm in terms of the quality and sophistication of data science your organization can conduct. Historically, data science conducted with unstructured data has been “shallow,” meaning model inputs tend to be simple (e.g. word counts) and model outputs paint with a broad brush (e.g. traditional sentiment analysis). Forge.AI brings unparalleled fidelity to the identification, extraction, resolution and semantic framing of the interaction between the entities, events and relationships expressed in unstructured data. This fidelity enables a new level of sophistication and time series modeling and analysis within your organization. See more. See faster. Predict better.
Forge.AI Platform Abstract
Forge.AI collects publicly available information in real time and transforms it into a resource that: (i) is suitable for a wide assortment of machine learning and artificial intelligence applications, enhancing data science environments; (ii) improves automated decision making and reasoning processes and (iii) enhances internal knowledge bases. The data resources created by Forge.AI are made available to our customers in either a streaming format or addressable through a SQL enabled cloud data warehouse (Snowflake).
Our customers simply “plug into” the Forge.AI data environment and immediately, transformed and enriched data is available to our customers.
To give a sense as to what is happening “under the hood”: a brief abstract of the Forge.AI processing platform is detailed below:
Collection: Our collection environment operates in real time 7x24x365. Source information is collected from tens of thousands of national, local, and industry specific news sites, organizations web sites, different US and international sites (e.g. the SEC), social media, and private feeds and / or feeds that you provide directly to us. Specially designed algorithms continually learn and optimize the process by which we collect source information. We abide by all politeness clause and other requirements in our collection process. If there is ever a source of data we are not collecting that is relevant to you we will work with you and that data source to incorporate into our ecosystem.
Perceive: Forge.AI natural language understanding environment extracts all relevant entities, events and relationship. In addition, it extracts a host of meta information and generates AI informed “calculated data” derived from source documents directly and/or in conjunction with our knowledge base.
Reason and Enrich: Forge.AI’s semantic, temporal and probabilistic knowledge base works in concert with our natural language understanding technologies to resolve all relevant data. Furthermore, the knowledge base enables contextual awareness for enhanced analytics such as dependency analysis etc.
The Forge.AI knowledge base is where knowledge that is extracted and learned by Forge.AI in its processing of information persists. The knowledge base is a dynamic and ever-changing environment that learns and grows every day as the world changes.
The knowledge base is also where Forge.AI maintains the various ontologies we support. An overview of the primary ontologies is listed below. It is worth noting that Forge.AI adds to the ontologies we support both organically and at the request of customers:
Data Output and Syndication: Data is syndicated to our customer environment in one of two ways.
The format of the data that is syndicated to our customers is extensively described below. There are three “classes” of data extracted and generated by Forge.AI:
Connecting with the Forge.AI Platform
Connecting with the Forge.AI environment is simple, and an abstract is provided below. You simply choose to connect to our cloud-based data warehouse, or you choose to connect to our push API.
Schema
Diagram
Documents
This may be the same as the original URL if there was no redirection.
Current types =
articlesecfilingNotes:
docDateTimeisNULLif no document date could be identified for the document.Example:
DocumentSource
The channel the document was collected through.
Documents(docId)Example:
DocumentMetadata
Relevant HTTP header information acquired during document collection.
Documents(docId)Example:
SimilarDocuments
A collection of references to documents that are closely identical to this document.
Documents(docId)A score of 1.0 indicates the documents are identical.
Notes:
similarDocIdmay not be present in theDocumentstable.Example:
Sections
The collection of sections constituting the document.
Documents(docId)The possible types are: Text (
T)Example:
SectionMetadata
Any metadata associated with a section represented as name-value pairs.
Sections(docId, sectionId)Sections(docId, sectionId)Example:
SourceTexts
If the section is a text (
T) section then this table will contain the source text.Sections(docId, sectionId)Sections(docId, sectionId)If the section text is less than 10 sentences then there will be no summary.
Notes:
titleandtextmay beNULLbecause of licensing constraints.Example:
Sentences
The set of extracted sentences from the text section.
Sections(docId, sectionId)Sections(docId, sectionId)Notes:
textmay beNULLbecause of licensing constraints.Example:
Entities
The collection of entities discovered in the source document. An entity is named object or domain concept.
Documents(docId)S) or complex (C) entity.A complex entity is composed of a set of other entities (e.g.
CEO Sally Smithis composed of a title and a person element) while a simple entity consists of just one entity type.Example:
EntityComponents
The components are the constituent entities that collectively constitute the complex entity.
Entities(docId, entityId)Entities(docId, entityId)Entitiesof the componentEntityAssertions
The extracted entities are disambiguated/resolved using the ForgeAI semantic knowledge base.
Entities(docId, entityId)Entities(docId, entityId)Example:
EntityProperties
Entity properties intrinsic to the entity. Examples include “latitude=42.0016”, “CIK=320193”, etc.
Entities(docId, entityId)Entities(docId, entityId)Example:
EntityLocation
The entity label’s location into the document.
Entities(docId, entityId)Entities(docId, entityId)Example:
DocumentTopics
Documents(docId)Example:
DocumentTopicWords
Documents(docId)DocumentTopics(name, parent)DocumentTopics(name, parent)Example:
DocumentTopicModelVectors
Collection of document score topic model vectors that can be used for assessing document similarity, change detection, etc.
Comparisons should not be made between topic vectors generated using different topic models. Applications may demand multiple topic models drawn from different corpora. The set of topic vectors allows for associating multiple topic vector calculations with a text section.
Documents(docId)Example:
TopicModelVectors
Section level topic model vector.
Sections(docId, sectionId)Sections(docId, sectionId)Example:
Relationships
A semantic association based on the ForgeAI relationship ontology between two entities in the document.
Documents(docId)Example:
RelationshipPredicateAssertions
Relationships(docId, relationshipId)Relationships(docId, relationshipId)RelationshipSupportingSentences
Information identifying the locations in the document supporting the relationship assertion.
Relationships(docId, relationshipId)Relationships(docId, relationshipId)Example:
Sample Queries
Number of documents collected by sites over the past week.
Documents mentioning a company of interest.
Knowing the company’s URI
Using the company’s CIK
Documents mentioning a company of interest within the past 7 days.
Number of documents per day mentioning a company of interest within the past 7 days.
Documents mentioning a company of interest from a specific source.
Documents mentioning 2 companies of interest.
Or
Documents classified with a topic of interest.
Relationships for a person of interest.
Documents that cluster (in one dimension of the LDA space) around a company of interest.
Documents that cluster (in the LDA vector space) around a company of interest.
List all 8-K filings for Apple (CIK=3571) in 2018Q3
List all organizations mentioned in a Risk Factors section (1A) of a 10-K filing for Apple (CIK=3571)
Useful links