We provide IT Staff Augmentation Services!

Hadoop Datastage Engineer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Certified Java programmer with 9+ Years of extensive experience in IT including few years of Big Data related technologies.
  • Currently Researcher and developer Technical lead of data engineering team, team works with data scientists in developing insights
  • Good exposure in following all the process in a production environment like change management, incident management and managing escalations
  • Hands - on experience on major components in Hadoop Ecosystem including Hive, HBase, HBase-Hive Integration, PIG, Sqoop, IBM Datastage Flume & knowledge of Mapper/Reducer/HDFS Framework.
  • Hands on experience in Cloud AWS, Azure, Installation, configuration, maintenance, monitoring, performance and tuning, and troubleshooting Hadoop clusters in different environments such as Development Cluster, Test Cluster and Production
  • Defined file system layout and data set permissions
  • Monitor local file system disk space usage, log files, cleaning log files with auto script
  • Extensive knowledge of Front End technologies like HTML, CSS, Java Script.
  • Good working Knowledge in OOA & OOD using UML and designing use cases.
  • Good communication skills, work ethics and the ability to work in a team efficiently with good leadership skills.

TECHNICAL SKILLS

Big Data: Hadoop, HDFS, MapReduce, Hive, Sqoop, Pig, HBase, MongoDB, Flume, Zookeeper, Oozie.

Operating Systems: Windows, Ubuntu, Red Hat Linux, Linux, UNIX

Java Technologies: JDBC, JAVA, SQL, JavaScript, J2EE, C, JDBC, SQL, PL/SQL

Programming or Scripting Languages: Java, SQL, Unix Shell Scripting, C.,Python

Database: MS-SQL, MySQL, Oracle, MS-Access

Middleware: Web Sphere, TIBCO

IDE’s & Utilities: Eclipse and JCreator, NetBeans

Protocols: TCP/IP, HTTP and HTTPS.

Testing: Quality Center, Win Runner, Load Runner, QTP

Frameworks: Hadoop,py-spark,Cassendra

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Datastage Engineer

Responsibilities:

  • Fresh installed 10 node Hadoop HDP 2.6 distribution Cluster
  • Installed and configured BigIntegrate, IBM Datastage 11.5 and used it to import data from sql server to hdfs
  • Worked with Mongo Db to import FirstChat database data into Hadoop
  • Created MLib models and pyspark jobs to study corelation between different database
  • Used sqoop to import sql tables into hive HQL
  • Worked on AWS, Azure and Spark for Machine Learning algotithms
  • Installed and upgraded Ambari 2.6.1 and setup all Hadoop components
  • Worked independently on project and performed python programming to map employees in different tables
  • Worked with RDBMS and star schema
  • Performed transformation in Datastage for reading sql queries to transform data into hive
  • Assist in installing Kerberos and LDAP to enable security in cluster

Confidential

Hadoop Developer

Responsibilities:

  • HDP 2.3 distribution for development Cluster
  • Updating database with JDBC to handle insert, update and delete requests
  • Hadoop eco systems, Java,Linux, Hive, Map reduce to process data Contribution
  • Writing Map reduce Java for processing xmls
  • Built Datawarehouse DMs and data hubs
  • Proficient performed coding in Java Spring, Junit in bitbucket
  • Monitored Hadoop cluster performance in Cloudera version 5.7
  • Created models in MLibs used AWS and S3 buckets in realtime processing
  • Experience in linux scripting to process files in lower environment
  • Worked in Agile environment and GIT
  • Experience in building Hadoop cluster in 10 node installation and Providing production support for cluster maintenance
  • Performed Job runs through Linux to process files from TIBCO server to Hadoop Cluster

Confidential, NewYork, NY

Hadoop Developer

Responsibilities:

  • HDP 2.3 distribution for development Cluster
  • Setup upcoming data in Nifi, Kafka to receive data in HDFS
  • Build user access control, visualization and setting Modules
  • Prepare formatted data to chart Controllers.
  • Developed User Access Control by using Spring interceptor and JDBC
  • Hadoop eco systems, Python, Spark, Hive, Map reduce to process data Contribution
  • Creating jobs in datastage 11.0 to ingest data into hdfs and hive
  • Writing Map reduce for processing xmls and flat files
  • Experience in Scala and Cassendra database for processing jobs and QlikSense for visualization
  • Worked on Machine Learning and Python
  • Experience in building Hadoop cluster in multi node installation and Providing production support for cluster maintenance
  • Provide strategic direction to the team
  • Assigning work to subordinates
  • Completed replacing nodes and scaling the cluster
  • Track risk and report to project manager
  • Provide project status to senior management
  • There were 10 node cluster with Hortonworks data platform with 550 GB RAM, 10 TB SSDs and 8 cores
  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig and Hive, Hbase database and Sqoop
  • Performed Hadoop administrator work managing cluster maintenance and zookeeper
  • Monitor performance of cluster through Ambari, Ganglia
  • Triggered workflows based on time or availability of data using the Oozie Coordinator Engine

Confidential, Louisville, KY

Hadoop admin Lead

Responsibilities:

  • HDP 2.0 distribution for development Cluster
  • All the datasets was loaded from two different source such as Oracle, MySQL to HDFS and Hive respectively on daily basis
  • Worked as Hadoop admin and performed monitoring of production cluster and changes to cluster
  • Worked with Amazon Webservices
  • We were getting on an average of 80 GB on daily basis on the whole the data warehouse. We used 12 node cluster to process the data
  • Involved in loading data from Linux system to HDFS
  • Writing Map reduce and Pyspark jobs for cleansing and applying algorithms
  • Cassendra database was use to transform queries to Hadoop HDFS
  • Designed scalable big data cluster solutions
  • Monitored job status through email received from cluster health monitoring tools
  • Responsible to manage data coming from different sources.

We'd love your feedback!