Ferramentas para capturar e converter a Web

Capturar tabelas HTML de sites com PythonAPI Python

Existem v√°rias maneiras de converter tabelas HTML into CSV e planilhas do Excel usando API Python do GrabzIt, detalhadas aqui est√£o algumas das t√©cnicas mais √ļteis. No entanto, antes de come√ßar, lembre-se de que depois de ligar para o URLToTable, HTMLToTable or FileToTable m√©todos os Save or SaveTo O m√©todo deve ser chamado para capturar a tabela. Se voc√™ quiser ver rapidamente se este servi√ßo √© adequado para voc√™, tente uma demonstra√ß√£o ao vivo da captura de tabelas HTML de um URL.

Op√ß√Ķes B√°sicas

O snippet de código abaixo converte automaticamente a primeira tabela HTML em uma página da web especificada into Um documento CSV que pode ser baixado ou analisado.

grabzIt.URLToTable("https://www.tesla.com")
# Then call the Save or SaveTo method
grabzIt.HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>")
# Then call the Save or SaveTo method
grabzIt.FileToTable("tables.html")
# Then call the Save or SaveTo method

Por padr√£o, isso converter√° a primeira tabela que identifica intuma mesa. No entanto, a segunda tabela em uma p√°gina da web pode ser convertida passando um 2 para o tableNumberToInclude atributo.

from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.tableNumberToInclude = 2

grabzIt.URLToTable("https://www.tesla.com", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.csv")
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.tableNumberToInclude = 2

grabzIt.HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.csv")
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.tableNumberToInclude = 2

grabzIt.FileToTable("tables.html", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.csv")

Você também pode especificar o targetElement atributo que garantirá que apenas tabelas dentro do ID do elemento especificado sejam convertidas.

from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.targetElement = "stocks_table"

grabzIt.URLToTable("https://www.tesla.com", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.csv")
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.targetElement = "stocks_table"

grabzIt.HTMLToTable("<html><body><table id='stocks_table'><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.csv")
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.targetElement = "stocks_table"

grabzIt.FileToTable("tables.html", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.csv")

Como alternativa, você pode capturar todas as tabelas em uma página da Web passando true para o includeAllTables atributo, no entanto, isso funcionará apenas com os formatos XLSX e JSON. Essa opção colocará cada tabela em uma nova planilha na pasta de trabalho da planilha gerada.

from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.format = 'xlsx'
options.includeAllTables = True

grabzIt.URLToTable("https://www.tesla.com", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.xlsx")
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.format = 'xlsx'
options.includeAllTables = True

grabzIt.HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.xlsx")
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.format = 'xlsx'
options.includeAllTables = True

grabzIt.FileToTable("tables.html", options)
# Then call the Save or SaveTo method
grabzIt.SaveTo("result.xlsx")

Converter tabelas HTML em JSON

O uso do servi√ßo de convers√£o de tabelas HTML do Python e GrabzIt permite converter tabelas HTML into JSON. O primeiro passo, como mostrado abaixo, √© especificar json no par√Ęmetro format. Em seguida, obtemos o JSON string sincronicamente com o SaveTo m√©todo, voc√™ pode usar seu analisador JSON favorito para Python para converter o JSON string intum objeto.

from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.format = "json"
options.tableNumberToInclude = 1

grabzIt.URLToTable("https://www.tesla.com", options)

json = grabzIt.SaveTo()
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.format = "json"
options.tableNumberToInclude = 1

grabzIt.HTMLToTable("<html><body><table><tr><th>Name</th><th>Age</th></tr>
    <tr><td>Tom</td><td>23</td></tr><tr><td>Nicola</td><td>26</td></tr>
    </table></body></html>", options)

json = grabzIt.SaveTo()
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.format = "json"
options.tableNumberToInclude = 1

grabzIt.FileToTable("tables.html", options)

json = grabzIt.SaveTo()

Identificador Personalizado

Você pode passar um identificador personalizado para o mesa Como mostrado abaixo, esse valor é retornado ao seu manipulador GrabzIt Python. Por exemplo, esse identificador personalizado pode ser um identificador de banco de dados, permitindo que uma captura de tela seja associada a um registro específico do banco de dados.

from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.customId = "123456"

grabzIt.URLToTable("https://www.tesla.com", options)
# Then call the Save method
grabzIt.Save("http://www.example.com/handler.py")
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.customId = "123456"

grabzIt.HTMLToTable("<html><body><h1>Hello World!</h1></body></html>", options)
# Then call the Save method
grabzIt.Save("http://www.example.com/handler.py")
from GrabzIt import GrabzItTableOptions
from GrabzIt import GrabzItClient

grabzIt = GrabzItClient.GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret")

options = GrabzItTableOptions.GrabzItTableOptions()
options.customId = "123456"

grabzIt.FileToTable("example.html", options)
# Then call the Save method
grabzIt.Save("http://www.example.com/handler.py")