json – The Ubuntu Incident

Exctract the significant parts of a web page

October 1, 2017 Jabba Laci Leave a comment

Problem
From a web page you want to extract the significant parts: title, author, date of publication, body, etc.

Solution
Mercury Web Parser does exactly this. It’s free. After registration you get an API key. Their web service returns a structured JSON response. I tried it with my previous post:

curl -H "x-api-key: <my_api_key>" "https://mercury.postlight.com/parser?url=https://ubuntuincident.wordpress.com/2017/10/01/re-run-a-command-in-the-terminal-every-x-seconds/" | python3 -m json.tool

Output:

{
    "title": "Re-run a command in the terminal every X\u00a0seconds",
    "author": "Jabba Laci",
    "date_published": "2017-09-30T22:02:19.000Z",
    "dek": null,
    "lead_image_url": "https://secure.gravatar.com/blavatar/db6c398dc21dc8e8f82d7bc83130c0ab?s=200&ts=1506809436",
    "content": "<div class=\"content\"> <p><strong>Problem</strong><br>\nYou want re-execute a command in the terminal every X seconds. For instance, you copy a lot of big files to a partition and you want to monitor the size of the free space on that partition.</p>\n<p><strong>Solution</strong><br>\nA naive and manual approach to the problem mentioned above is to execute the commands “<code>clear; df -h</code>” regularly, say every 2 seconds.</p>\n<p>A better way is to use the command “<code>watch</code>“. Usage:</p>\n<pre> watch -n 2 df -h </pre>\n<p>That is: execute “<code>df -h</code>” every two seconds. <code>watch</code> will also clear the screen and print the result to the top. You can quit with <code>Ctrl + c</code>.</p>\n<p>Tip from <a href=\"https://askubuntu.com/questions/430382/repeat-a-command-every-x-interval-of-time-in-terminal\">here</a>.</p> </div>",
    "next_page_url": null,
    "url": "https://ubuntuincident.wordpress.com/2017/10/01/re-run-a-command-in-the-terminal-every-x-seconds/",
    "domain": "ubuntuincident.wordpress.com",
    "excerpt": "Problem You want re-execute a command in the terminal every X seconds. For instance, you copy a lot of big files to a partition and you want to monitor the size of the free space on that partition.\u2026",
    "word_count": 108,
    "direction": "ltr",
    "total_pages": 1,
    "rendered_pages": 1
}

Pretty impressive.

Categories: web Tags: json, parser, web service

JSON Path

May 20, 2017 Jabba Laci 1 comment

I wrote a command-line program that outputs the full path of every key / value in a JSON file.

Example

$ ./json_path.py sample.json
root.a => 1
root.b.c => 2
root.b.friends[0].best => Alice
root.b.friends[1].second => Bob
root.b.friends[2][0] => 5
root.b.friends[2][1] => 6
root.b.friends[2][2] => 7
root.b.friends[3][0]. 1
root.b.friends[3][1].two => 2

More information at the project’s github page.

Categories: bash, python Tags: json, path

Detailed Twitter info in JSON: an undocumented feature

October 24, 2016 Jabba Laci Leave a comment

Problem
Using a script, I wanted to figure out the number of my followers on Twitter. Here is my (mostly abandoned) Twitter page: https://twitter.com/szathmar . I didn’t want to use any API since I didn’t want to register for an API key so I went on the easy way: let’s scrape the necessary data out :) Digging in the HTML code I found the number of followers, but I also found a hidden treasure!

Solution
And the hidden treasure is a long json string that contains all kinds of information about a twitter user:

Here on the screenshot you can see just an extract, the json string is much longer. Fine, let’s get it!

#!/usr/bin/env python3
# coding: utf-8

import json
import readline
import sys
from pprint import pprint

import requests
from bs4 import BeautifulSoup

def main():
    url = input("Full twitter URL: ")
    html = requests.get(url).text
    soup = BeautifulSoup(html, "lxml")

    tag = soup.find('input', {'class': 'json-data'})
    j = tag['value']
    d = json.loads(j)
    json_out = json.dumps(d, indent=4)
    print(json_out)

    # followers = d['profile_user']['followers_count']
    # print(followers)

##############################################################################

if __name__ == "__main__":
    main()

If you want the number of followers for instance, then uncomment the last two lines.

Thank you Twitter! It’s really nice of you to provide all these data in JSON!

Sample
The JSON that I could extract from my page is 743 lines long! Here is an extract of it:

...
"profile_image_url": "http://pbs.twimg.com/profile_images/459783802395430912/vcMT0CGX_normal.png",
"business_profile_state": "none",
"url": null,
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme6/bg.gif",
"screen_name": "szathmar",
"is_translator": false,
"friends_count": 123,
"followers_count": 70,
"profile_text_color": "333333",
"profile_link_color": "FF3300",
"translator_type": "none",
"profile_background_color": "709397",
...

Categories: python Tags: hidden, json, twitter, undocumented

show my position on the map

June 21, 2016 Jabba Laci Leave a comment

The site http://ipinfo.io/ gives you back not only your IP address, but your geolocation too. Example (with a fake IP):

$ curl http://ipinfo.io/
{
  "ip": "734.675.653.542",
  "hostname": "No Hostname",
  "city": "Debrecen",
  "region": "Debrecen",
  "country": "HU",
  "loc": "47.5333,21.6333",
  "org": "...",
  "postal": "..."
}

Let’s visualize my location:

<img src="https://maps.googleapis.com/maps/api/staticmap?center=47.5333,21.6333&zoom=9&size=480x240&sensor=false">

Debrecen, Hungary, center of the world :)

Categories: Uncategorized Tags: geolocation, google map, ip, json

Firefox: restore your lost tabs

April 30, 2014 Jabba Laci Leave a comment

Problem
Over the last 1.5-2 years, I collected 700+ tabs in my Firefox :) Maybe this summer I will have some time to sort them out. However, today when I switched my computer on, all my tabs were gone and I got a clean Firefox instance with one tab only. Hmm… I had a similar problem once and then I installed an add-on called “Session Manager”. In this add-on I made the setting to offer the list of previous sessions upon restart but it didn’t do anything! Damn, how to get back my tab collection?

Solution
In the .mozilla directory there is a file called sessionstore.js that stores — among others — the opened tabs. However, this file was very small, my previous tabs were clearly not in it. Thank God there was a backup copy of this file next to it called sessionstore.bak. It was a big file and the timestamp of the file indicated that it was created 2 days ago when everything was OK with my tabs.

So, how to extract the old tabs from sessionstore.bak?

This is a JSON file, but it’s not pretty printed. I suggest copying this file to somewhere else where you can experiment with it. First, let’s make it readable:

$ python -m json.tool sessionstore.bak > session.json

Now you can open session.json with a text editor. You will find lines with a “url” key, but the number of these rows is huge. I had 731 tabs (that I lost) but this file contained 6500+ URLs. As I noticed, it also contains the URLs of closed tabs. How to extract the URLs of the opened tabs only?

Again, Python came to my rescue. After analyzing the structure of this JSON file, I could extract the tab URLs the following way:

$ python  # version 2.7
>>> import json
>>> f = open('session.json')  # input file
>>> g = open('tabs.txt', 'w')  # output file
>>> d = json.load(f)
>>> tabs = d["windows"][0]["tabs"]
>>> cnt = 0
>>> for t in tabs:
...     print >>g, t["entries"][0]["url"]
...     cnt += 1
>>> cnt
731    # Yeah! All of them are here!
>>> g.close()
>>> f.close()

The URLs of the lost tabs are now in the tabs.txt file.

I didn’t make a script of it but feel free to do it. From now on I will make regular backups of my opened tabs with the URL Lister add-on.

Categories: firefox, python Tags: json, lost, restore, session, tab

Google’s URL shortener

June 25, 2013 Jabba Laci Leave a comment

Problem
You want to shorten a long URL from the command line / from a script.

Solution
There are lots of URL shorteners. With the Google URL shortener you can do it like this:

curl https://www.googleapis.com/urlshortener/v1/url -H 'Content-Type: application/json' -d '{"longUrl": "https://ubuntuincident.wordpress.com"}'

Sample output:

{
    "kind": "urlshortener#url",
    "id": "http://goo.gl/Zeigx",
    "longUrl": "https://ubuntuincident.wordpress.com/"
}

Exercise
Let’s do it in Python using the requests module:

import requests
import json

url = "https://www.googleapis.com/urlshortener/v1/url"
data = {"longUrl": "https://ubuntuincident.wordpress.com"}
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
r = requests.post(url, data=json.dumps(data), headers=headers)
print r.text
print 'Short URL:', r.json()["id"]

Links

Shorten a long URL @developers.google.com

Categories: python Tags: curl, Google URL shortener, json, requests, url shortener

jq — a lightweight and flexible command-line JSON processor

October 21, 2012 Jabba Laci Leave a comment

“jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

jq is written in portable C, and it has zero runtime dependencies.

jq can mangle the data format that you have into the one that you want with very little effort…” (link)

Check out the tutorial here.

You can also use jq to pretty print an ugly JSON file:

cat ugly_one_liner.json | jq '.'

Categories: bash Tags: jq, json, pretty-print, query

GitHub: contact watchers

September 7, 2011 Jabba Laci Leave a comment

Problem
You want to contact all the watchers of a project. For instance, you want to notify them about some radical changes.

Solution
Simply click on the “Eye” icon that shows the number of watchers. It will list your followers.

Or, you can get the list of watchers through an API:

curl http://github.com/api/v2/json/repos/show/USERNAME/REPONAME/watchers?full=1 | python -mjson.tool

More info about the Repositories API: here. General information about the APIs: here.

Categories: Uncategorized Tags: api, followers, github, json, watchers

Pretty print a JSON file

August 10, 2011 Jabba Laci 3 comments

This post is based on the following SO threads: one; two.

Problem

You have an unreadable JSON file from which you want to extract some data… How to prettify it, i.e. how to make it human readable?

Solution

There are web-based and command-line solutions. As an extra, we show you how to do it in Vim too.

Web-based prettifiers

http://chris.photobooks.com/json/default.htm (it can show you the path of a tag too)
http://pretty-print.org/
http://jsonviewer.stack.hu/
http://www.shell-tools.net/index.php?op=json_format
http://jsonlint.com/
http://jsonformatter.curiousconcept.com/ (formatter and validator)

Command-line beautifiers

curl -s http://www.reddit.com/r/nsfw/.json | python -mjson.tool
sudo apt-get install edit-json; prettify_json myfile.json

Vim :)

This tip is based on this post: Editing json files in vim.

In my .vimrc file I had to add the following lines:

" pretty-print JSON files
autocmd BufRead,BufNewFile *.json set filetype=json
" json.vim is here: http://www.vim.org/scripts/script.php?script_id=1945
autocmd Syntax json sou ~/.vim/syntax/json.vim
" json_reformat is part of yajl: http://lloyd.github.com/yajl/
autocmd FileType json set equalprg=json_reformat

When opening a .json file, it will be colored using the json.vim syntax file. Selecting a text and pressing the “=” button will indent the marked text using json_reformat.

Firefox add-on

There are several JSON visualizer add-ons for Firefox, e.g. JSONView.

Categories: bash, firefox, python, vim Tags: json, json path, json.tool, json.vim, json_reformat, prettify, prettify_json, pretty-print, reddit, vimrc, yajl

The Ubuntu Incident

Archive