这一章也是讲窗口函数,主要是LAG

Window LAG

1
2
3
4
5
6
7
8
9
1.The example uses a WHERE clause to show the cases in 'Italy' in March.
Modify the query to show data from Spain

SELECT name, DAY(whn),
confirmed, deaths, recovered
FROM covid
WHERE name = 'Spain'
AND MONTH(whn) = 3
ORDER BY whn
1
2
3
4
5
6
7
8
9
2.The LAG function is used to show data from the preceding row or the table. When lining up rows the data is partitioned by country name and ordered by the data whn. That means that only data from Italy is considered.
Modify the query to show confirmed for the day before.

SELECT name, DAY(whn), confirmed,
LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
ORDER BY whn
1
2
3
4
5
6
7
8
3.The number of confirmed case is cumulative - but we can use LAG to recover the number of new cases reported for each day.
Show the number of new cases for each day, for Italy, for March.

SELECT name, DAY(whn), confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)
FROM covid
WHERE name = 'Italy'
AND MONTH(whn) = 3
ORDER BY whn
1
2
3
4
5
6
7
8
9
4.The data gathered are necessarily estimates and are inaccurate. However by taking a longer time span we can mitigate some of the effects.
You can filter the data to view only Monday's figures WHERE WEEKDAY(whn) = 0.
Show the number of new cases in Italy for each week - show Monday only.

SELECT name, DATE_FORMAT(whn,'%Y-%m-%d'), confirmed-LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)
FROM covid
WHERE name = 'Italy'
AND WEEKDAY(whn) = 0
ORDER BY whn
1
2
3
4
5
6
7
8
9
10
11
5.You can JOIN a table using DATE arithmetic. This will give different results if data is missing.
Show the number of new cases in Italy for each week - show Monday only.
In the sample query we JOIN this week tw with last week lw using the DATE_ADD function.

SELECT tw.name, DATE_FORMAT(tw.whn,'%Y-%m-%d'),
tw.confirmed - lw.confirmed
FROM covid tw LEFT JOIN covid lw ON
DATE_ADD(lw.whn, INTERVAL 1 WEEK) = tw.whn
AND tw.name=lw.name
WHERE tw.name = 'Italy' and WEEKDAY(tw.whn) = 0
ORDER BY tw.whn
1
2
3
4
5
6
7
8
9
10
11
6.Include the ranking for the number of deaths in the table.

SELECT
name,
confirmed,
RANK() OVER (ORDER BY confirmed DESC) rc,
deaths,
RANK() OVER (ORDER BY deaths DESC) rc1
FROM covid
WHERE whn = '2020-04-20'
ORDER BY confirmed DESC
1
2
3
4
5
6
7
8
9
10
11
12
7.The query shown includes a JOIN t the world table so we can access the total population of each country and calculate infection rates (in cases per 100,000).
Show the infect rate ranking for each country. Only include countries with a population of at least 10 million.

SELECT
world.name,
ROUND(100000*confirmed/population,0)as infect_rate,
RANK() OVER(ORDER BY infect_rate)
FROM covid JOIN world ON covid.name=world.name
WHERE whn = '2020-04-20' AND population > 10000000
ORDER BY population DESC

感觉答案好像出错了,不正常...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
8.For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases.

select table1.name,DATE_FORMAT(table1.whn,'%Y-%m-%d'),table1.c1 from
(select newcases.name,newcases.whn,
rank() over(partition by name order by c1 desc)as rk,newcases.c1 from
(select name,whn,
(confirmed-lag(confirmed,1) over(partition by name order by whn))as c1
from covid
)as newcases
)as table1
where table1.rk = 1 and table1.c1 >= 1000
order by table1.whn

气抖冷,其实跟答案是一样的但是顺序不一样,一直排不出来答案的顺序害
感觉有一个问题是题目要求at least,应该是筛选出每天的新增数都要大于1000的国家,那应该是min<=1000,又突然发现第一天可能是null,于是乎排除掉有null的行,结果出错了,不晓得为啥...

Summary

  • 这次主要是LAG函数,貌似跟之前接触过的LEAD函数很像,改天一定好好整理下窗口函数= 。 =