Hadoop Datastage Engineer Resume

SUMMARY

Certified Java programmer with 9+ Years of extensive experience in IT including few years of Big Data related technologies.
Currently Researcher and developer Technical lead of data engineering team, team works with data scientists in developing insights
Good exposure in following all the process in a production environment like change management, incident management and managing escalations
Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, IBM Datastage Flume & knowledge of Mapper/Reducer/HDFS Framework.
Hands on experience in Cloud AWS, Azure, Installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production
Defined file system layout and data set permissions
Monitor local file system disk space usage, log files, cleaning log files with auto script
Extensive knowledge of Front End technologies like HTML, CSS, Java Script.
Good working Knowledge in OOA & OOD using UML and designing use cases.
Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.

TECHNICAL SKILLS

Big Data: Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, HBase, MongoDB, Flume, Zookeeper, Oozie.

Operating Systems: Windows, Ubuntu, Red Hat Linux, Linux, UNIX

Java Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL

Programming or Scripting Languages: Java, SQL, Unix Shell Scripting, C.,Python

Database: MS-SQL, MySQL, Oracle, MS-Access

Middleware: Web Sphere, TIBCO

IDE’s & Utilities: Eclipse and JCreator, NetBeans

Protocols: TCP/IP, HTTP and HTTPS.

Testing: Quality Center, Win Runner, Load Runner, QTP

Frameworks: Hadoop,py-spark,Cassendra

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Datastage Engineer

Responsibilities:

Fresh installed 10 node Hadoop HDP 2.6 distribution Cluster
Installed and configured BigIntegrate, IBM Datastage 11.5 and used it to import data from sql server to hdfs
Worked with Mongo Db to import FirstChat database data into Hadoop
Created MLib models and pyspark jobs to study corelation between different database
Used sqoop to import sql tables into hive HQL
Worked on AWS, Azure and Spark for Machine Learning algotithms
Installed and upgraded Ambari 2.6.1 and setup all Hadoop components
Worked independently on project and performed python programming to map employees in different tables
Worked with RDBMS and star schema
Performed transformation in Datastage for reading sql queries to transform data into hive
Assist in installing Kerberos and LDAP to enable security in cluster

Confidential

Hadoop Developer

Responsibilities:

HDP 2.3 distribution for development Cluster
Updating database with JDBC to handle insert, update and delete requests
Hadoop eco systems, Java,Linux, Hive, Map reduce to process data Contribution
Writing Map reduce Java for processing xmls
Built Datawarehouse DMs and data hubs
Proficient performed coding in Java Spring, Junit in bitbucket
Monitored Hadoop cluster performance in Cloudera version 5.7
Created models in MLibs used AWS and S3 buckets in realtime processing
Experience in linux scripting to process files in lower environment
Worked in Agile environment and GIT
Experience in building Hadoop cluster in 10 node installation and Providing production support for cluster maintenance
Performed Job runs through Linux to process files from TIBCO server to Hadoop Cluster

Confidential, NewYork, NY

Hadoop Developer

Responsibilities:

HDP 2.3 distribution for development Cluster
Setup upcoming data in Nifi, Kafka to receive data in HDFS
Build user access control, visualization and setting Modules
Prepare formatted data to chart Controllers.
Developed User Access Control by using Spring interceptor and JDBC
Hadoop eco systems, Python, Spark, Hive, Map reduce to process data Contribution
Creating jobs in datastage 11.0 to ingest data into hdfs and hive
Writing Map reduce for processing xmls and flat files
Experience in Scala and Cassendra database for processing jobs and QlikSense for visualization
Worked on Machine Learning and Python
Experience in building Hadoop cluster in multi node installation and Providing production support for cluster maintenance
Provide strategic direction to the team
Assigning work to subordinates
Completed replacing nodes and scaling the cluster
Track risk and report to project manager
Provide project status to senior management
There were 10 node cluster with Hortonworks data platform with 550 GB RAM, 10 TB SSDs and 8 cores
Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, Hbase database and Sqoop
Performed Hadoop administrator work managing cluster maintenance and zookeeper
Monitor performance of cluster through Ambari, Ganglia
Triggered workflows based on time or availability of data using the Oozie Coordinator Engine

Confidential, Louisville, KY

Hadoop admin Lead

Responsibilities:

HDP 2.0 distribution for development Cluster
All the datasets was loaded from two different source such as Oracle, MySQL to HDFS and Hive respectively on daily basis
Worked as Hadoop admin and performed monitoring of production cluster and changes to cluster
Worked with Amazon Webservices
We were getting on an average of 80 GB on daily basis on the whole the data warehouse. We used 12 node cluster to process the data
Involved in loading data from Linux system to HDFS
Writing Map reduce and Pyspark jobs for cleansing and applying algorithms
Cassendra database was use to transform queries to Hadoop HDFS
Designed scalable big data cluster solutions
Monitored job status through email received from cluster health monitoring tools
Responsible to manage data coming from different sources.