Hadoop Datastage Engineer Resume
SUMMARY
- Certified Java programmer with 9+ Years of extensive experience in IT including few years of Big Data related technologies.
- Currently Researcher and developer Technical lead of data engineering team, team works with data scientists in developing insights
- Good exposure in following all the process in a production environment like change management, incident management and managing escalations
- Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, IBM Datastage Flume & knowledge of Mapper/Reducer/HDFS Framework.
- Hands on experience in Cloud AWS, Azure, Installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production
- Defined file system layout and data set permissions
- Monitor local file system disk space usage, log files, cleaning log files with auto script
- Extensive knowledge of Front End technologies like HTML, CSS, Java Script.
- Good working Knowledge in OOA & OOD using UML and designing use cases.
- Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.
TECHNICAL SKILLS
Big Data: Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, HBase, MongoDB, Flume, Zookeeper, Oozie.
Operating Systems: Windows, Ubuntu, Red Hat Linux, Linux, UNIX
Java Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL
Programming or Scripting Languages: Java, SQL, Unix Shell Scripting, C.,Python
Database: MS-SQL, MySQL, Oracle, MS-Access
Middleware: Web Sphere, TIBCO
IDE’s & Utilities: Eclipse and JCreator, NetBeans
Protocols: TCP/IP, HTTP and HTTPS.
Testing: Quality Center, Win Runner, Load Runner, QTP
Frameworks: Hadoop,py-spark,Cassendra
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Datastage Engineer
Responsibilities:
- Fresh installed 10 node Hadoop HDP 2.6 distribution Cluster
- Installed and configured BigIntegrate, IBM Datastage 11.5 and used it to import data from sql server to hdfs
- Worked with Mongo Db to import FirstChat database data into Hadoop
- Created MLib models and pyspark jobs to study corelation between different database
- Used sqoop to import sql tables into hive HQL
- Worked on AWS, Azure and Spark for Machine Learning algotithms
- Installed and upgraded Ambari 2.6.1 and setup all Hadoop components
- Worked independently on project and performed python programming to map employees in different tables
- Worked with RDBMS and star schema
- Performed transformation in Datastage for reading sql queries to transform data into hive
- Assist in installing Kerberos and LDAP to enable security in cluster
Confidential
Hadoop Developer
Responsibilities:
- HDP 2.3 distribution for development Cluster
- Updating database with JDBC to handle insert, update and delete requests
- Hadoop eco systems, Java,Linux, Hive, Map reduce to process data Contribution
- Writing Map reduce Java for processing xmls
- Built Datawarehouse DMs and data hubs
- Proficient performed coding in Java Spring, Junit in bitbucket
- Monitored Hadoop cluster performance in Cloudera version 5.7
- Created models in MLibs used AWS and S3 buckets in realtime processing
- Experience in linux scripting to process files in lower environment
- Worked in Agile environment and GIT
- Experience in building Hadoop cluster in 10 node installation and Providing production support for cluster maintenance
- Performed Job runs through Linux to process files from TIBCO server to Hadoop Cluster
Confidential, NewYork, NY
Hadoop Developer
Responsibilities:
- HDP 2.3 distribution for development Cluster
- Setup upcoming data in Nifi, Kafka to receive data in HDFS
- Build user access control, visualization and setting Modules
- Prepare formatted data to chart Controllers.
- Developed User Access Control by using Spring interceptor and JDBC
- Hadoop eco systems, Python, Spark, Hive, Map reduce to process data Contribution
- Creating jobs in datastage 11.0 to ingest data into hdfs and hive
- Writing Map reduce for processing xmls and flat files
- Experience in Scala and Cassendra database for processing jobs and QlikSense for visualization
- Worked on Machine Learning and Python
- Experience in building Hadoop cluster in multi node installation and Providing production support for cluster maintenance
- Provide strategic direction to the team
- Assigning work to subordinates
- Completed replacing nodes and scaling the cluster
- Track risk and report to project manager
- Provide project status to senior management
- There were 10 node cluster with Hortonworks data platform with 550 GB RAM, 10 TB SSDs and 8 cores
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, Hbase database and Sqoop
- Performed Hadoop administrator work managing cluster maintenance and zookeeper
- Monitor performance of cluster through Ambari, Ganglia
- Triggered workflows based on time or availability of data using the Oozie Coordinator Engine
Confidential, Louisville, KY
Hadoop admin Lead
Responsibilities:
- HDP 2.0 distribution for development Cluster
- All the datasets was loaded from two different source such as Oracle, MySQL to HDFS and Hive respectively on daily basis
- Worked as Hadoop admin and performed monitoring of production cluster and changes to cluster
- Worked with Amazon Webservices
- We were getting on an average of 80 GB on daily basis on the whole the data warehouse. We used 12 node cluster to process the data
- Involved in loading data from Linux system to HDFS
- Writing Map reduce and Pyspark jobs for cleansing and applying algorithms
- Cassendra database was use to transform queries to Hadoop HDFS
- Designed scalable big data cluster solutions
- Monitored job status through email received from cluster health monitoring tools
- Responsible to manage data coming from different sources.