- Experience in solving various problems using data analytics and machine learning with Spark, R, Python, Hadoop ecosystem, and Tableau.
- Responsible for Implementation of MapR Hadoop Distribution, OS integration and application installations.
- Previously worked on MS SQL Server Database Administration for over four years.
- Performance Monitoring and Fine - tuning of large scale Hadoop production Clusters using Apache,
- Cloudera as well as Cloud AWS servers.
- My Areas of interest are Machine Learning and Data Processing, IT Infrastructure and Cloud Systems.
Programming platforms: C, C++, Python Machine learning, Apache Spark, Scala, SQL and Shell scripting. Familiarity with Hadoop HDFS, AWS S3 and R.
Data Analysis: NoSQL data stores Hadoop, MySQL, MapReduce Large - scale, distributed systems design and development Scaling, performance and scheduling and ETL techniques C, C++, Java
IT Infrastructure: Root Cause Analysis, Microsoft System Center Operations Manager, Microsoft Server Manager, VMWare, Citrix XenCenter, Symantec NetBackup Administrator, Nagios, Zenoss, NetApp, Watchdog.
Windows/Linux Skills: Active Directory, Group Policy, TCP/IP, DHCP, DNS, LDAP, PuTTY, SSH NTP, FTP, NIS, NFS, Shell Scripting, iSCI
- Installed Hadoop cluster and worked with big data analysis tools including hive
- Implemented multiple nodes on cloudera
- Imported data from Linux file system to HDFS
- Performed data transfer from SQL to HBase using Sqoop worked with managing and reviewing Hadoop log file
- Worked on evaluating, architecting, installation/setup of Hortonworks 2.1/1.8 Big Data ecosystem which includes Apache Hadoop HDFS, Pig, Hive and Sqoop.
- Maintained and backed up meta - data
- Used data integration tools like Sqoop
- Supported IT department in cluster hardware upgrades.Administer, Monitor and Maintain Hadoop 0.20 cluster setup with over 100TB available storage.
- Worked on tools such as Scoop to import the data from RDBMS systems or traditional warehouse to Hadoop
- Created multi-cluster test to test the system's performance and failover
- Improved a high-performance cache, leading to a greater stability and improved performance.
- Responsible to Design & Develop the Business queries using hive.
- Designed and developed automation test scripts using Python.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Experience working with Hive-QL on day to day basis to retrieve data from HDFS
- Worked with developers to setup a full Hadoop system on AWD
Environment: HDFS CDH3, CDH4, Hbase, NOSQL, RHEL 4/5/6, Hive, Pig, Perl Scripting and AWS S3, EC2, Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat
- Expertise in successfully ramping business assignments; while working in close coordination with clients and ensuring effective service deliverables
- A proactive leader with expertise in handling market plan execution, staffing and targeted marketing; proven ability to achieve the pre-set sales targets
- Ability to manage client's expectations and build relationships with distinction of serving reputed corporate clients.
- Proficient in executing tasks and projects with proven ability to enhance operational effectiveness and meet goals within the cost, time & quality parameters
- Possess excellent analytical, relationship management and interpersonal skills