sqanything (AlaSQL Chrome Extension)

Get it

Visit the Chrome Web Store to add sqanything to Chrome.

What is sqanything?

Data tables are all over the internet, especially Wikipedia. But HTML tables can be difficult to download or copy-paste cleanly. The sqanything extension lets you query the HTML tables in all your open Chrome tabs with SQL.

You can also query tables in Notion, CSV or Excel files, and tables in Google Sheets. You can send the results of your query to Google Sheets or just send all the tables on a web page to Google Sheets.

Blocks

Each individual query, including the web pages that it queries from and the destination it sends to (if applicable), is called a "Block". When you launch the extension, you can see your past Blocks by selecting "View History".

When you select one of your blocks, it will automatically launch the web pages used in that query (including Wikipedia pages, CSV’s, Notion Pages, public Google Spreadsheets, etc.), so you can run the same query and get the same results. You can create a new Block by selecting "New SQL Block" after you launch the extension.

Schema

When you start or launch a Block, you will see a tab for "Schema" underneath the query text. This shows you the active HTML tables currently in your Chrome tabs. If you don’t see any tables, you can select "Refresh Schema" at the bottom.

Each web page in your Chrome tabs is assigned an "alias", such as "851nb3". All the tables on that web page are given the same alias, plus the order in which they appear on the web page (for example, the 5th table on the page would be "851nb3-5"). When you open a web page under your Schema, you can see all the tables on that page. When you open a specific table, you can see the number of columns, an example query, and the first 5 rows of that data table. The example query will look something like this: "SELECT * FROM HTML("#t851nb3-5", {headers: true}) LIMIT 5". You can copy and paste the example query into your query text in order to start querying from that table.

If you wish to remove a web page or data table from your Schema for a certain block (so that web page will no longer automatically launch when you select that block), you can select the trash can icon to the right of the web page or table.

Destination

Under the query text of a Block, there is also a tab for "Destination". Here, you can enter a Spreadsheet ID and Sheet name (and, optionally, cell location) to send your query result to. The Spreadsheet ID can be found in the Google Sheet URL. When you select "Run" after entering your destination, your query result will be sent to that Sheet.

If you want to simply send all the tables on a web page (or web pages) to Google Sheets, you can select "Extract HTML tables to Sheets" after launching the extension. Here you can enter the same information as above, and also a prepend to the Sheet name (for example "test" so each Sheet will be called "test_" + the table alias).

AlaSQL

This app uses AlaSQL. See the AlaSQL documentation for syntax questions.

Example queries

SELECT
[ISO 3166 Country Code] as code,
Country as country,
Latitude as lat,
Longitude as long

FROM CSV("https://cdn.rawgit.com/albertyw/avenews/master/old/data/average-latitude-longitude-countries.csv",{headers:true})
WHERE Latitude BETWEEN 0 AND 25
ORDER BY Longitude DESC

SELECT * FROM CSV('https://cdn.rawgit.com/albertyw/avenews/master/old/data/average-latitude-longitude-countries.csv',{headers:true})
WHERE Latitude BETWEEN 0 AND 25

SELECT * FROM CSV('https://drive.google.com/u/0/uc?id=1F-B8Z3YUizsOaBcdSPk-0lWVrC0mxYYs&export=download',{headers:true})

SELECT * FROM CSV('https://docs.google.com/spreadsheets/d/1zhfO8ANC-odz7IDMw065yHbQVPBjtrlo2ukysNlqLlo/gviz/tq?tqx=out:csv&sheet=Sheet1',{headers:true})

SELECT sum(Wins) FROM CSV('https://docs.google.com/spreadsheets/d/1zhfO8ANC-odz7IDMw065yHbQVPBjtrlo2ukysNlqLlo/gviz/tq?tqx=out:csv&sheet=Sheet1',{headers:true})

SELECT * FROM CSV('https://docs.google.com/spreadsheets/d/1zhfO8ANC-odz7IDMw065yHbQVPBjtrlo2ukysNlqLlo/gviz/tq?tqx=out:csv&gid=975578095',{headers:true})

Google Sheets

SELECT *
FROM CSV(
	'https://docs.google.com/spreadsheets/d/1zhfO8ANC-odz7IDMw065yHbQVPBjtrlo2ukysNlqLlo/gviz/tq?tqx=out:csv&gid=975578095',
	{headers: true}
)

Tips, tricks, and workarounds

String to Number

REPLACE([2017], ",", "")::NUMBER as tomatoes,

Selecting columns with spaces

SELECT [Column Name With Space] AS new_col_name FROM table_name

Tables with unusual headers

When you have tables with misaligned headers and columns, try using {headers:false}. For example, in this url: https://en.wikipedia.org/wiki/Victory_Bowl

SQAnything header

There are fewer headers than columns. Setting the headers = false, results in gathering the correct data:

Set headers as false
SELECT
`0` as date,
`1` as winner,
`2` as wscore,
`3` as loser,
`4` as lscore,
`5` as stadium
FROM HTML("#REPLACE_WITH_YOUR_ID-1", {headers: false}) LIMIT 5

Sources to try

  • HTML tables

    • Wikipedia

    • Jupyter Notebook

  • Notion

  • Excel

  • Sheets

  • CSV

Examples

How many acquisitions do Google and Yahoo make per year?

If you wanted to compare the number of acquisitions Google and Yahoo have made over time, you could pull up the Wikipedia pages on Google’s (or technically Alphabet’s) acquisitions and Yahoo’s acquisitions. Then, you could run the query below after replacing "#yahoo-table-alias" and "#google-table-alias" with their respective table aliases under "Schema" in the extension.

with google as (
SELECT
case when [Acquisition date] like '%,%' then
substr(([Acquisition date]), instr(([Acquisition date]),',')+1,
length(([Acquisition date])))
else substr(([Acquisition date]), instr(([Acquisition date]),' ')+1,
length(([Acquisition date])))
end as year,'google' as acquirer, Company as company
 FROM HTML("#google-table-alias-0", {headers: true})
where [Acquisition date] <> "[to be determined]"
),
yahoo as (
SELECT
case when Date like '%,%' then
substr(( Date), instr((Date),',')+1, length(( Date)))
else substr(( Date), instr(( Date),' ')+1, length(( Date)))
end as year,'yahoo' as acquirer, Company as company
 FROM HTML("#yahoo-table-alias-0", {headers: true})
)
select cast(year as integer) as year,
sum(case when acquirer = "yahoo" then 1 else 0 end)
as  yahoo_acquisitions,
sum(case when acquirer = "google" then 1 else 0 end)
as  google_acquisitions
from (
select year,acquirer, company  from yahoo
union all
select year,acquirer, company from google ) as a group by 1

You have to do some string manipulation to extract the year from the acquisition date columns. If you run that query after replacing the table aliases, and then send the data to Google Sheets to chart, you should get a chart that looks something like below. From the chart, you can see that Yahoo’s acquisitions have slowed down in the last few years; the company was acquired themselves by Verizon in 2017. Alphabet completed a total of 34 acquisitions in 2014, including DeepMind and Nest.

Yahoo and Google Acquisitions chart

Querying Notion

You can query any table in Notion that is in an open tab and is publicly accessible. For example, say you had a table in Notion where you kept track of your favorite movies, like the one below.

Favorite Movies
Title Stars Rotten Tomatoes IMDB Year Director

2001: A space Odyssey

⭐⭐⭐⭐⭐

93

8.3

1968

Stanley Kubrick

Citizen Kane

⭐⭐⭐⭐⭐

100

8.3

1941

Orson Welles

Eternal Sunshine of the Spotless Mind

⭐⭐⭐⭐⭐

93

8.3

2004

Michel Gondry

Moonlight

⭐⭐⭐⭐⭐

94

7.4

2016

Barry Jenkins

Goodfellas

⭐⭐⭐⭐⭐

96

8.7

1990

Martin Scorcesse

Psycho

⭐⭐⭐⭐⭐

96

8.5

1960

Alfred Hitchcock

Dark Knight

⭐⭐⭐⭐⭐

94

9

2008

Christopher Nolan

Schindler’s List

⭐⭐⭐⭐⭐

97

8.9

1993

Steven Spielberg

Casablanca

⭐⭐⭐⭐⭐

99

8.5

1942

Michael Curtiz

Apocalypse Now

⭐⭐⭐⭐⭐

98

8.4

1979

Francis Ford Coppola

8 1/2

⭐⭐⭐⭐⭐

98

8

1963

Federico Fellini

In the Mood for Love

⭐⭐⭐⭐⭐

90

8.1

2000

Wong Kar-wai

Lost in Translation

⭐⭐⭐⭐⭐

95

7.7

2003

Sofia Coppola

Blue Velvet

⭐⭐⭐⭐⭐

94

7.7

1986

David Lynch

The Searchers

⭐⭐⭐⭐⭐

98

7.9

1956

John Ford

You could simply query this table by opening that Notion page in one of your tabs—it should show up under your Schema. If the table does not appear, try selecting "Refresh Schema". To query the table, simply replace its table alias where it says "#replace-notion-table-alias" in the following query:

select *  FROM HTML("#replace-notion-table-alias", {headers: true})

Say you wanted to join the table above with the list of Best Picture Nominated movies to see which of your favorite movies have been nominated for Best Picture. If you opened up the Wikipedia page for Best Picture Nominees, you could join them with your favorite movie table by writing the following query and replacing "#best-picture-noms-table-" with the table aliases for the Best Picture Wikipedia page that shows up under "Schema":

With bp_noms as (
SELECT Year as movie FROM
HTML("#best-picture-noms-table-2", {headers: true})
UNION ALL
SELECT Year as movie FROM
HTML("#best-picture-noms-table-3", {headers: true})
UNION ALL
SELECT Year as movie FROM
HTML("#best-picture-noms-table-4", {headers: true})
UNION ALL
SELECT Year as movie FROM
HTML("#best-picture-noms-table-5", {headers: true})
UNION ALL
SELECT Year as movie FROM
HTML("#best-picture-noms-table-6", {headers: true})
UNION ALL
SELECT Year as movie FROM
HTML("#best-picture-noms-table-7", {headers: true})
UNION ALL
SELECT Year as movie FROM
HTML("#best-picture-noms-table-8", {headers: true})
UNION ALL
SELECT Year as movie FROM
HTML("#best-picture-noms-table-10", {headers: true})
UNION ALL
SELECT Year as movie FROM
HTML("#best-picture-noms-table-11", {headers: true})
),
my_favorites as (
select *  FROM HTML("#replace-notion-table-alias-0", {headers: true})
)

SELECT  case when b.movie is not null then 1 else 0 end as nominated,
a.*  from
my_favorites as a left join bp_noms as b on a.Title = b.movie

The tables in that page are separated by decade so you must use "UNION ALL" to combine all of them into a single table. If you run that query with the example Favorite Movies notion page above, you should see the result that Citizen Kane, Moonlight, Casablanca, Apocalypse Now, and Lost in Translation were both in the favorite movies table and nominated for Best Picture.

Which cities had a World Series victory and Super Bowl victory in the same year?

If you wanted to quickly find out which cities had a World Series and a Super Bowl victory in the same year, you could open up the Wikipedia list of Super Bowl winners, the list of World Series winners, and the list of major North American cities (to help extract the city from the team names). Then simply replace the "#sb-table-alias","#ws-table-alias",and "#city-table-alias" in the query below with the corresponding table aliases that show up under "Schema" after you open those web pages. To clean up this query, it helps to replace "New York" with "New York City" and "New England" with "Boston" for consistency between the tables.

with sb_winners as
(SELECT DateSeason as date_season,
substr((DateSeason), instr((DateSeason),',')+1,
instr((DateSeason),',')-6) as year,
case when [Winning team] like "New York%" then
REPLACE([Winning team],'New York','New York City')
when [Winning team] like "New England%" then
REPLACE([Winning team],'New England','Boston')
else [Winning team] end as winner FROM
HTML("#sb-table-alias-1", {headers: true})),

ws_winners as (

SELECT Year as year, case when [Winning team] like "New York%"
then REPLACE([Winning team],'New York','New York City')
when [Winning team] like "New England%"
then REPLACE([Winning team],'New England','Boston')
else [Winning team] end as winner FROM
HTML("#ws-table-alias-1", {headers: true})
),

cities as (
SELECT Case when City like '%]%' then substr((City), 1, instr((City),'[')-1)
else City end as city
FROM HTML("#city-table-alias-4", {headers: true})
)


select  c.city, ws.winner as ws_winner,
sb.winner as sb_winner, ws.year as ws_year, sb.year as sb_year
from cities as c inner join
ws_winners as ws on ws.winner like c.city || '%' inner join
sb_winners as sb on sb.winner like c.city || '%'
where cast(ws.year as integer) = cast(sb.year as integer)

You should see the result that three different cities had both a World Series victory and a Super Bowl victory in the same year: New York (Mets and Jets) in 1969, Boston (Red Sox and Patriots) in 2004, and Pittsburgh (Pirates and Steelers) in 1979.

SQAnything city and winner chart

Which movies were Joaquin Phoenix and Amy Adams in together?

Say you had a desire to find out which movies Joaquin Phoenix and Amy Adams both acted in. You could open up the Wikipedia pages for Joaquin Phoenix filmography and the list of Amy Adams performances. Then replace "#jp-table-alias" and "#aa-table-alias" in the query below with the corresponding table aliases that show up under "Schema" after you open those Wikipedia pages.

with jp_movies as (
SELECT case when a.Year GLOB  '*[^0-9]*'
THEN a.Year else a.Title end as title
 FROM HTML("#jp-table-alias-0", {headers: true}) as a
),
aa_movies as (
SELECT case when a.Year GLOB  '*[^0-9]*'
THEN a.Year else a.Title end as title
 FROM HTML("#aa-table-alias-0", {headers: true}) as a
)
select jp.title from jp_movies as jp INNER JOIN
aa_movies as aa on jp.title = aa.title
WHERE jp.title <> "TBA"

Since those tables have offset columns for Year and Title, one solution is to check the "Year" column for digits, and if it doesn’t have any then use "Year" instead of "Title" as the title. The reason this happens is that the "Year" column can span multiple rows, so the first row for each year moves all the columns over one. If you run that query, you should get the result that they were both in the movies Her and The Master.

SQAnything film query result

Are Category 4 Hurricanes becoming more common?

You can also easily manipulate or aggregate data in Wikipedia tables and then send the data to Google Sheets to chart. As an example, you could aggregate the number of category 4 hurricanes per year from the Category 4 Hurricane Pacific Wikipedia page and send it to Google Sheets to analyze to see if the hurricanes are becoming more or less common. In the query below, replace "#cat-4-hurricane-table" with the corresponding table alias under "Schema" after you open that page.

SELECT COUNT(*) as cat_4_hurricanes, Season
FROM HTML("#cat-4-hurricane-table-1", {headers: true})
group by Season

You can send the results to Sheets by entering the Spreadsheet ID and Sheet name under "Destination". After you send that query to Sheets and chart the data, you should see a chart that looks like this:

Number of Category 4 Hurricanes per season chart

Was this page helpful?