Isilon: EMC Isilon Command Line

Isilon storage and solutions provide in various forums. pls use on you own risk. not responsible for any loss of data. thanks for looking

Isilon Performance Stats

Summary

isi statistics drive --nodes=all --orderby=busy --type=sas,sata --top

orisi statistics drive --nodes=all --orderby=busy --type=sas,sata | head -n 30
isi_gather_info # collect status of cluster and send to support (usually auto upload via ftp)

List Packages

cluster-1# isi pkg list

Uninstall Package

Cluster-1# isi pkg delete patch-71234

Install Package

cluster-1# isi pkg install patch-71234.tar

isi pkg info

Sessions per node

isi statistics drive –nodes=2 –top

isi statistics drive –top

isi statistics pstat –top

isi statistics query –nodes=all –stats=node.disk.xfers.rate.sum –top

Edit for you drive number per chassis. This is based on 36 drives

for i in {0..35};do isi statistics query –nodes=all –stats=node.disk.xfers.rate.$i;done$i;done

cd .snapshot

Admin can manually delete snapshot, or take snapshot of a specific directory tree instead of the whole OneFS.

In OneFS 6.5, maybe the template will be replicated to all nodes? Or maybe that's only for syslogd, but not sshd, as they are concerned it may lock user out from all the nodes from ssh access...

1	`fstat \| sort` `-k 8 -n -r \| more`

1 2	`isi config` `quit`

1	`isi_info_gather`

1 2	`isi status` `isi alerts`

isi_for_array 'df -h'
Isi services apache2 enable

isi services cifs disable

isi services cifs enable

isi_for_array -sq ‘find /var/crash -type f -size +10000 -exec ls -lh {} \;’

Isilon Shares

isi smb shares create SHARENAME /ifs/*

isi smb shares permission create SHARENAME --group "domain\domain users" --permission-type allow --permission full --zone system

isi smb shares permission create SHARENAME --group "domain\domain admins" --permission-type allow --permission full --zone system

isi smb shares permission delete SHARENAME --wellknown everyone –force

SPN’s

# isi auth ads spn create --user=<Administrator> --spn=cifs/<cluster.domain.local>

# isi auth ads spn create --user=<Administrator> --spn=host/<cluster.domain.local>

UNIX Shares

# isi nfs exports create --rwclient=x.x.x.x --rootclient=x.x.x.x --path=/ifs/data/test

QUOTA Creation CLI

cluster-1# isi quota quotas create --path /ifs/XXX/TEST/Admin --type directory --hard-threshold 70G --soft-threshold 60G --soft-grace 7D --advisory-threshold 50G --container yes --include-snapshots no

Cluster-1# isi quota quotas create --path /ifs/XXX/TEST/Acct --type directory --hard-threshold 210G --soft-threshold 200G --soft-grace 7D --advisory-threshold 190G --container yes --include-snapshots no

Other Useful Live Monitoring

isi statistics system --nodes --top

isi statistics client --orderby=Ops --top

isi statistics heat --top

isi statistics pstat

isi stat -d

isi_for_array -s 'isi_hw_status -i'

uname -a

Commands

For manual pages, use an underscore (e.g., man isi_statistics). The command line is much more complete than the web interface but not completely documented. Isilon uses zsh with customized tab completion. When opening a new case include output from "uname -a" & "isi_hw_status -i", and run isi_gather_info.

isi_for_array -s: Execute a command on all nodes in in order.

isi_hw_status -i: Node model & serial number -- include this with every new case.

isi status: Node & job status. -n# for particular node, -q to skip job status, -d for SmartPool utilization; we use isi status -qd more often.

isi statistics pstat --top & isi statistics protocol --protocol=nfs --nodes=all --top --long --orderby=Ops

isi networks

isi alerts list -A -w: Review all alerts.

isi alerts cancel all: Clear existing alerts, including the throttled critical errors message. Better than the '''Quiet''' command, which can suppress future errors as well.

isi networks --sc-rebalance-all: Redistribute SmartConnect IPs to rebalance load. Not suitable for clusters with CIFS shares.

du -A: Size, excluding protection overhead, from an Isilon node.

du --apparent-size: Size, excluding protection overhead, from a Linux client.

isi devices: List disks with serial numbers.

isi snapshot list --schedule

isi snapshot usage | grep -v '0.0'

isi quota list --show-with-no-overhead | isi quota list --show-with-overhead | isi quota list --recurse-path=/ifs/nl --directory

isi quota modify --directory --path=/ifs/nl --reset-notify-state

isi job pause MultiScan / isi job resume MultiScan

isi job config --path jobs.types.filescan.enabled=False: Disable MultiScan.

isi_change_list (unsupported): List changes between snapshots.

sysctl -n hw.physmem: Check RAM.

isi device -a smartfail -d 1:bay6 / isi devices -a stopfail -d 1:bay6 (stopfail is not normally appropriate)

isi devices -a add -d 12:10: Use new disk in node 12, bay 10.

date; i=0; while [ $i -lt 36 ]; do isi statistics query --nodes=1-4 --stats=node.disk.xfers.rate.$i; i=$[$i+1]; done # Report disk IOPS(?) for all disks in nodes 1-4 -- 85-120 is apparently normal for SATA drives.

isi networks modify pool --name *$NETWORK*:*$POOL* --sc-suspend-node *$NODE*: Prevent $POOL from offering $NODE for new connections, without interfering with active connections. --sc-resume-node to undo.

isi_lcd_d restart: Reset LEDs.

isi smb config global modify --access-based-share-enum=true: Restrict SMB shares to authorized users (global version); isi smb config global list | grep access-based: verify (KB #2837)

ifa isi devices | grep -v HEALTHY: Find problem drives.

isi quota create --path=$PATH --directory --snaps=yes --include-overhead --accounting

cd /ifs; touch LINTEST; isi get -DD LINTEST | grep LIN; rm LINTEST: Find the current maximum LIN

Cluster Performance Snapshot

isi statistics pstat

List files in use

isi statistics heat --nodes=all --orderby=ops --top

List of client connections

isi statistics client --nodes=all --orderby=ops –top

Isilon Performance issue

The WebGUI is ok, but IMO to slow for live monitoring.

On the command line interface (CLI):

isi nfs clients ls

isi perfstat

Check load balancing across all nodes in cluster

isi_for_array “isi smb session list |grep -i computer |wc -l”

How many open files on each node

isi_for_array “isi smb file list |grep -i path |wc -l”

Drive utilization for specific chassis

Drive utilization for current chassis

Cluster wide statistics

IOPS for Cluster

Drive Queue

isi statistics drive --nodes=all --orderby=queued --type=sas,sata –top

Disk IOPS per Chassis/Drive

For your original test, you might max out with the disk IOPS (xfers), but you could also get stuck at a certain rate of your "application's IOPS " while seeing few or no disk activity at all(!) -- because your data is mostly or entirely in the OneFS cache . Check the "disk IOPS" or xfers, including ave size per xfer, with

isi statistics drive -nall -t --long --orderby=OpsOut

and cache hit rates for data (level 1 & 2) with:

isi_cache_stats -v 2

isi statistics drive -nall --orderby=Inodes --long --top

the latter shows (in very verbose form, but not so easy to count

the number of disks used) the actual layout of the file on the cluster disks.

Usually "streaming" access files should spread onto more disks,

but on small (or fragmented?) clusters the difference between

streaming/random/concurrency might appear minimal.

isi set -l {concurrency|streaming|random} -r g retune "filename"

will actually change the layout if needed. (I trust this more

that the WebUI). Even if finished, it might take some

more seconds until changes show up with isi get -DD

In case of very effective caching the IOPS will NOT be limited by disk transfers (so all that filesystem block size reasoning doesn't apply).

Instead the limit is imposed by CPU usage, or network bandwidth, or by protocol (network + execution) latency even

if CPU or bandwidth < 100%.

In the latter case, doing more requests in parallel should be possible (it seems you are right on that track anyway with multiple jobs).

To check protocol latencies, use "isi statistics client" as before and add --long:

isi statistics client --orderby=Ops --top --long

This will show latency times as: TimeMax TimeMin TimeAvg (also useful for --orderby=... !)

Maybe a few things can be checked in advance (before tracking things down to disk level):

- double check that no background jobs are running and stealing CPU or IOPS

- with four clients, is the network traffic well balanced across the four Isilon nodes?

- are the actual NFS read/write sizes large enough for 128K? (server and client negotiate a match within their limits.)

- is the random access pattern really in effect?

- for 128K reads, one could also try the concurrency pattern...

NFS number of threads: This is the number of NFS server daemon threads that are started when the system boots. The OneFS NFS server usually has 16 threads as its default setting; this value can be changed via the Command Line Interface (CLI):

isi_sysctl_cluster sysctl vfs.nfsrv.rpc.[minthreads,maxthreads]

Increasing the number of NFS daemon threads improves response minimally; the maximum number of NFS threads needs to be limited to 64.

I think that's 64 per node (isi_sysctl_cluster just spreads the setting to all nodes.)

And wether 64 Isilon threads do better or worse than 256 "brand X" threads

is up to the implementations; you might need to do tests.

It seems that isi_cache_stats -v prints totals since startup,

and it is even more useful when monitoring live deltas at

regular intervals like 5s: isi_cache_stats -v 5

BEWARNED, what follows isn't something that I recommend for a non-test situation. You can flush all CACHE (read) from a node or all nodes using

isi_flush

or isi_for_array -s isi_flush

this will happily flush all your cache warmth for your work-flow. USE WITH care, you will impact the cache performance benefit realtime from all your active work-flow clients.

The isi_cache_stats tool is a wrapper to the sysctl isi.cache.stats. As you indicated the data is collected from cluster uptime. The first row returned is typically the global amount since uptime or data reset.

You can run isi_cache_stats -z then isi_cache_stats 5. This will clear the global stats and then start to monitor the number of realtime blocks that are started and from which you gain benefit from.

The isi_cache_stats in the non -v case are just a summary of what you see in the -v. The only real difference is that it shows you BLOCKS as human readable rather than blocks.

BTW: Another lightweight means to look at cluster wide work-flow as you look atisi_cache_stats is

isi perfstat

A means to measure your buffered writes is simply to measure the latency seen for protocol write operations

Isi statistics protocol --class=write --orderby=timeavg --top

In 7x onefs you should note very optimal writes in the microsecond range. When this climbs to the millisecond range, the two simple reasons would be

1) the journal cannot flush writes to disk based on rate of change. This is another way of saying that there are insufficient disks in the node pool to satisfy the demand.

Isi statistics drive -nall --orderby=timeinq --long --top

You might note that the sum of Opsin (writes) + opsout (read) exceeds a normal range for disk type. You would see > 1 queued io . The more queued the more significant it would be to look to increasing spindle count. Adding nodes almost immediately brings new disks into the fold.

HD Replacement

isi devices # list all devices of the node logged in

isi devices -a status -d 14:bay28 # see statys of node 14, drive 28

isi devices -a add -d 14:28 # add the drive (after being replaced)

isi devices -a format -d 14:28 # often need to format the drive for OneFS use first

# it seems that after format it will automatically use drive (no ADD needed)

# other actions are avail, eg smartfail a drive.

isi_for_array -s 'isi devices | grep -v HEALTHY' # list all problematic dev across all nodes of the cluster.

isi statistics drive --long # 6.5 cmd to see utilization of a hd.

user mapper stuff

id username

id windowsDomain\\windowsUser

# Note that, username maybe case sensitive!!

isi auth ads users list --uid=50034

isi auth ads users list --sid=S-1-5-21-1202660629-813497703-682003330-518282

isi auth ads groups list --gid=10002

isi auth ads groups list --sid=S-1-5-21-1202660629-813497703-682003330-377106

isi auth ads user list -n=ntdom\\username

# find out Unix UID mapping to Windows SID mapping:

# OneFS 6.5 has new commands vs 6.0

isi auth mapping list --source=UID:7868

isi auth mapping rm --source=UID:1000014

isi auth mapping flush --source=UID:1000014 # this clear the cache

isi auth mapping flush --all

isi auth local user list -n="ntdom\username" -v # list isilon local mapping

isi auth mapping delete --source-sid=S-1-5-21-1202660629-813497703-682003330-518282 --target-uid=1000014 --2way

# should delete the sid to uid mapping, both ways.

isi auth mapping delete --target-sid=S-1-5-21-1202660629-813497703-682003330-518282 --source-uid=1000014

# may repeat this if mapping not deleted.

isi auth mapping dump | grep S-1-5-21-1202660629-813497703-682003330-518282

isi auth ads group list --name

isi auth local users delete --name=ntdom\\username --force

rcf2307 is prefered auth mechanism... windows ad w/ services for unix.

isi smb permission list --sharename=my_share

# find out Unix UID mapping to Windows SID mapping:

isi auth ads users map list --uid=7868

isi auth ads users map list --sid=S-1-5-21-1202660629-813497703-682003330-305726

isi auth ads users map delete --uid=10020

isi auth ads users map delete --uid=10021

isi_for_array -s 'lw-ad-cache --delete-all' # update the cache on all cluster node

# windows client need to unmap and remap drive for new UID to be looked up.

# for OneFS 6.0.x only (not 6.5.x as it has new CIFS backend and also stopped using likewise)

# this was lookup uid to gid map.

sqlite3 /ifs/.ifsvar/likewise/db/idmap.db 'select sid,id from idmap_table where type=1;' # list user sid to uid map

sqlite3 /ifs/.ifsvar/likewise/db/idmap.db 'select sid,id from idmap_table where type=2;' # list group sid to gid map

1: The DB that you are looking at only has the fields that you are seeing listed.

With the current output it will give you the SID and UID of the users mapped.

With these commands you can find the username that is mapped to that information:

#isi auth ads users list --uid={uid}

#isi auth ads users list --sid={sid}

2: The entries in the DB are made as the users authenticate to the cluster.

So when a client tries to access the share, the client sends over the SID,

we check the DB and if no entry is found, we check with NIS/LDAP,

if nothing is found there, we generate our own ID (10000 range) and add it to the DB.

Any subsequent access from that SID will be mapped to the UID in that DB.

3: You can run the following to get the groups and the same rules

apply for the GID and SID lookups:

#sqlite3 /ifs/.ifsvar/likewise/db/idmap.db 'select sid,id from idmap_table where type=2;'

#isi auth ads groups list --gid={gid}

#isi auth ads groups list --sid={sid}

4: You can delete the entries in the database,

but the current permissions on files will remain the same.

So when the user re-accesses the cluster he will go through the

process outlined in question 1.

Snapshot

Snapshots take up space reported as usable space on the fs.

CIFS

ACL

ls -led # show ACL for the current dir (or file if filename given)

ls -l # regular unix ls, but + after the permission bits indicate presence of CIFS ACL

setfacl -b filename # remove all ACL for the file, turning it back to unix permission

chmod +a user DOMAIN\\username allow generic_all /ifs/path/to/file.txt # place NTFS ACL on file, granting user full access

ls -lR | grep -e "+" -e "/" | grep -B 1 "+" # recursively list files with NTFS ACL, short version

ls -lR | grep -e "^.......... +" -e "/" | grep -B 1 "^.......... +" # morse code version, works better if there are files w/ + in the name

Time Sync

isi_for_array -s 'isi auth ads dc' # check which Domain Controller each node is using

isi_for_array -s 'isi auth ads dc --set-dc=MyDomainController # set DC across all nodes

isi_for_array -s 'isi auth ads time' # check clock on each node

isi auth ads time --sync # force cluster to sync time w/ DC (all nodes)

isi auth ads status # check join status to AD

killall lsassd # reset daemon, auth off for ~30sec, should resolve offline AD problems

"unix" config

Syslog

isi_log_server add SYSLOG_SVR_IP [FILTER]

-or-

vi /etc/mcp/templates/syslog.conf

isi_for_array -sq 'killall -HUP syslogd'

Disable user ssh login to isilon node

For Isilon OneFS 6.0:

vi /etc/mcp/templates/sshd_config

add line

AllowUsers root@*

Then copy this template to all the nodes:

cp /etc/mcp/templates/sshd_config /ifs/ssh_config

isi_for_array 'cp /ifs/ssh_config /etc/mcp/templates/sshd_config

One may need to restart sshd, but in my experience sshd pick up this new template in less than a minute and users will be prevented from logging in via ssh.

Isilon WebGui restart

isi services -a isi_webui disable

isi services -a isi_webui enable

Create share permissions using cli

share permissions are often confused with NTFS Security permissions. The

Share Permissions are your first security gate, once a user passes that

gate, he is faced with the next security gate and that is the ACL.

Lets take an example:

A share testshare1 was created. While creating the share the storage admin

selected "Do not Change Existing Permissions" option. He applied Domain

Admins => Full Control and Finance Group Full Control Share Permissions.

Later he created an ACL from the OneFS command line as follows.

*chmod +a group "Paddy\finance" allow

dir_gen_all,container_inherit,object_inherit testshare1*

*This is how it looks from the OneFS CLI:*

*# ls -lend /ifs/data/testshare1*

drwxrwxr-x + 2 root wheel 23 Oct 29 17:47 /ifs/data/testshare1

OWNER: user:root

GROUP: group:wheel

CONTROL:dacl_auto_inherited,dacl_protected

0: everyone allow dir_gen_read,dir_gen_execute

1: user:root allow

dir_gen_read,dir_gen_write,dir_gen_execute,std_write_dac,delete_child

2: group:wheel allow dir_gen_read,dir_gen_execute

3: group: paddy \finance allow dir_gen_all,object_inherit,container_inherit

This allows only the Finance group to write to the directory

/ifs/data/testshare1.

When any user from Domain Admins tries to access the share they may be able

to access it but although they are a powerful domain admins user with full

control on the share, they may still *not* be able to write to that share

or modify the ACLs (security tab in explorer) on that share, because the

ACLs prevent them from doing so.

Whereas any user from the finance group is able to access the share and

modify data and ACLs on that directory.

In your case, review the ACLs on the directory being shared and see if the

group or user you are trying to access has permissions.

Now, lets say Domain Admins wants to modify the ACLs, you can modify the

share permissions for "Domain Admins" Temporarily to run-as-root and you

will be able to modify the NTFS Security Permissions on that directory.

Hopefully you have obtained permissions from the fictitious Finance Group

to do so :)

Finding large files

fstat is a bit like lsof in the Linux world, but exists on FreeBSD:

Finding serial number

gathering and uploading info, usually required for a support call

Show status/alert info

Do something on all cluster nodes

I had this problem last week. The /var filesystem was full, but contained few files. This in term seemed to kill cifs and the web interface, though nfs was fine.

Long story short, it’s probably snmpd, there’s a bug in a version of the isilon os (possibly fixed now).

You can use fstat to find abnormally large open files (unfortunately lsof isn’t present, so I couldn’t see a way to locate unlinked files) and the process that has them open. You can then kill -9 snmpd. After that you can restart services as follows:

isi services apache2 disable

You may also need to kill off webui/smbd (killall -9 isi_webui_d).

Here are some some useful Isilon commands to assist you in troubleshooting Isilon storage array issues.

Grep the log for stalled drives on the isilon cluster

cat /var/log/messages |grep -o 'stalled: [0-9,*:]*'|sort |uniq -c

(Stalled drives are bad, and can cause cluster problems. you could also run this command on the individual nodes /var/log/restripe.log )

Grep the log for stalled drives on the isilon cluster for month of Sept

grep 'Nov ' /var/log/messages |grep -o 'stalled: [0-9,*:]*'|sort |uniq -c

Use this on the restripe.log

grep 'Nov ' /var/log/restripe.log |grep -o 'Stalled drives are \[[0-9,*:]*\]'|sort |uniq -c

When reviewing the results of the stalled drives it is important to note that the drive numbers listed is the logical drive number and not the bay number. You need to run the command “isi devices” on the node with the suspect drive to determine what bay the drive is actually in.

Display the SMART error log of all the drives on a given isilon node:

isi_radish -a|less

Display the current isilon Flexprotect Policy

isi get /ifs

Display the current isilon node hardware status:

isi_hw_status

Display the status of the isilon node network config

isi config

then while in the config utility

status

Display this list of alerts in wide format

isi alerts -w

Start/Stop/Resume/Pause Restriper jobs

isi restripe pause

isi restripe start

isi restripe stop

isi restripe resume -i

Display the drive status of a given isilon node

#for node 3

isi devices -d 3

Display the SAS drives Physical Monitoring stats for errors

less /var/log/isi_sasphymon.acc

Test Active Directory connections from all isilon nodes

isi_for_array wbinfo -t

To find an open file on Isilon Windows share

isi_for_array -q -s smbstatus | grep

then find the PID from the results and then run this to get the user

isi_for_array -q -s smbstatus -u| grep to get the user

Note: The isi_for_array command runs the command on all of the nodes. This command will ask for the user’s password so that it can login to the other nodes and complete the command. When passing the results of a “isi_for_array” command to another command such as grep (like the example above) will require the user password so that it can be passed to the other nodes. There is no prompt for the password so you must enter it on the next line and press enter to get the results of the command.

To Fail the Disk on the node proactively

isi devices -a smartfail -d 11:bay4

To Gather Logs on all Nodes

isi_gather_info -f /var/crash

To see what’s taking up space in the /var/crash partition, run the following command on any node in the cluster:

isi_for_array -qs ‘df -h’

To check windows mappings on the isilon

isi auth mapping token –name=enterprise\\userid

once the syniq is done follow the below procedure to make the target file systems as read and write

isi sync target break –policy=govindisi01_ifs_hybrid_gridlogs_TO_govindisi02_ifs_hybrid_gridlogs –force

isi sync target break –policy=govindisi01_ifs_hybrid_BRID_DATA_TO_govindisi02_ifs_hybrid_BRID_DATA –force

RE-IMAGE / RE-FORMAT

In certain scenarios (single-node test clusters) you might want to re-image a node, the isi_reimage command can be used to accomplish this. When used in conjunction with with the -b options, it is possible to re-image the node with any build you have media for the node.

isi_reimage -b OneFS_v5.5.4.21_Install.tar.gz

The isi_reformat_node command can be used reset the configuration on a node, format the dirves and reimage. The command performs a variety of functions such as checking ware on SSD drives before proceeding with the reformat.

isi_reformat_node with the --factory options will format / reimage the node, turn off the nvram battery and power off the node. Useful if you are pulling a node for long-term storage or shipping to another site.

As with isi_reimage, you don't want to run either of these command on a node that is a member of a multi-node cluster.

Rename Node Isilon

One of the great things about the Isilon architecture is that you can add and remove nodes from your cluster.

Let’s say you have a cluster of three 12000X nodes and you want to replace then with three new x200 nodes, now you could leave the original nodes in the cluster as a lower / slower tier of storage and make use of the SmartPools technology to place you different data types on the most appropriate nodes, or you could simply replace you old nodes with new ones.

Suppose my cluster has three 12000X nodes cluster-1, cluster2 and cluster3.

I add three X200 nodes into the cluster, which are assigned the names cluster-4, cluster-5 and cluster-6.

I decide to retire / SmartFail the 12000X nodes and now have a cluster with just three nodes named cluster-4, cluster-5 and cluster-6.

I could leave things exactly as they are, but I’d rather have my three nodes with names cluster-1, cluster-2 and cluster-3; no problem I can renamed then (without downtime) using the isi conf command.

From an ssh window, launch isi conf

Cluster-4# isi conf

cluster >>> lnnset 4 1

Node 4 changed to Node 1. Change will be applied on 'commit'

cluster>>> commit

Commit succeeded.

cluster-4#

As you can see in the above, you may need to reconnect to your ssh session before the new node name is automatically changed.

cluster-4#

cluster-4# hostname

cluster-1

3 comments:

UnknownJune 4, 2014 at 9:15 AM
Thanks for the tip on finding the number of LINs on a system. I've taken it a bit further:
isi get -DD LINTEST | awk '$2 ~ /LIN:/ { split($3, a, /:/); x = "0x" a[2] a[3]; print int(x)}'
UnknownFebruary 21, 2018 at 4:25 AM
Well written article. Get rid of the traditional ways of marketing and buy a customized email list from us. One real-time data can change your entire business for good. Parana Impact provides you the Email List that you are exactly looking for.
We have worked with many organizations of all the sizes. Our main focus is customer satisfaction. Isilon Users Email List
AnonymousMarch 3, 2022 at 2:56 PM
JT Casino - JT Hospitality - Hotel and Spa
JTG casino, 구리 출장안마 Hotel and Spa. JT 진주 출장안마 Casino offers luxurious amenities and an array 의왕 출장샵 of 의정부 출장샵 amenities, including a sauna, 익산 출장샵 an outdoor swimming pool and

Monday, November 11, 2013

EMC Isilon Command Line

Isilon storage and solutions provide in various forums. pls use on you own risk. not responsible for any loss of data. thanks for looking

3 comments: