Hadoop Developer Resume
3.00/5 (Submit Your Rating)
CharlottE
SUMMARY:
- Hadoop consultant with about 6 years of software development experience, involved in different phases of software development lifecycle while working in different projects
- About 4 years of experience in working with Hadoop ecosystem components such as HDFS, YARN, Hive, Spark, Sqoop, Oozie, HBase, Phoenix, NiFi
- Experience in installing and configuring Hadoop cluster with abilities to provide recommended hardware configuration to obtain best performance depending on use cases
- Experience in developing and scheduling automated ETL framework to import data into Hadoop from external database systems such as SQL, Oracle, DB2 using Oozie workflows that involve Sqoop, Hive and Spark actions
- Designed and developed applications that leverage NoSQL databases such as HBase/Phoenix as a backend data store
- Ability to develop spark applications using pyspark and sparksql, also good understanding on tuning the required parameters to get the best performance on a spark job
- Experience of deploying several data science models on to Hadoop platform and operationalizing them by developing the required framework using pyspark jobs, shell scripts, phoenix and hive tables
- Basic understanding of object detection algorithms using tensorflow and ability to perform transfer learning to develop new object detection models out of existing models using new set of images for training
- Ability to tune several components across the cluster for better performance by identifying optimal values to be set for different parameters based on the cluster configuration
- Experience in setting up data HA and DR for the Hadoop clusters by setting up backup and replication of data across different clusters using DistCP, HDFS snapshots for HDFS data and HBase native replication for HBase data
- Basic understanding of NiFi, Kafka and experience of setting up few basic NiFi flows for real time streaming
- Experience in working with structured and unstructured data such as image files, GIS data, sensor data (.tdms files, .nc files etc)
- Experience working in Agile environments and experience in using multiple code versioning tools like Git, Bitbucket
TECHNICAL SKILLS:
Big Data: Hadoop, Hive, HBase, Sqoop, Oozie, Spark, Phoenix, NiFi, Kafka, Ambari, Druid
Security: Ranger, Kerberos, Ranger KMS
Programming: Python, pyspark, shell scripting, scala
Databases: SQL server, MySQL, Oracle, DB2
Data Science Algorithms: Logistic Regression, LSTM, Object detection using Faster RCNN and Mobilenet SSD
PROFESSIONAL EXPERIENCE:
Hadoop Developer
Confidential, Charlotte
- Developed ETL workflows using Oozie and SparkSQL to import all the required data from RDMS systems into Hadoop
- Worked closely with data scientists and performed data engineering tasks by transforming the imported data, using Hive queries and SparkSQL capabilities to generate materialized views or parquet files that contain data as required by the data scientists
- Created Hive and Spark UDFS as and when needed to perform required transformations on data
- Converted the data science model developed by data scientists from python to pyspark and operationalized the model using python and shell scripts to automate the process of running model on new data as required and save the results to final Phoenix tables
- Worked with complex data types such as BLOB, imported them from Oracle database and unparsed them using python unstruct package to store them as individual records in Hive table
- Worked with UI and backend developers by providing them the required details that facilitate the front - end application to retrieve data from final Phoenix table using API calls and display on the UI
- Installed and configured five different Hadoop clusters with about 80 nodes using Hortonworks HDP 2.4.3 on RHEL 7.2 servers, also configured security on the cluster using AD integrated Kerberos Authentication and Ranger for authorization
- Configured High Availability for different components such as HDFS, YARN, Hive and HBase, also worked on setting up Disaster recovery for the production cluster to switch to DR cluster if needed
- Performed necessary tuning on the clusters by setting up correct values for several parameters that impact the cluster performance
- Installed Hive LLAP and configured Hive Interactive Server for faster query response
- Upgraded HDP from 2.4.3 to 2.5.2 and then to 2.6.3 on all the clusters
- Set up YARN queues, HDFS quotas to efficiently manage cluster resources across several users and projects
- Developed and scheduled jobs to copy HDFS data from production to DR cluster using DistCp
- Commissioned and decommissioned nodes from the clusters whenever required
- Installed and configured multiple third-party tools such as Jupyterhub and Rstudio on the Hadoop clusters, also installed anaconda to maintain multiple python instances on each cluster
- Developed python and shell scripts to automate ad-hoc tasks to be performed on the clusters such as file system checks, service status checks etc.
Systems Engineer
Confidential
- Worked in telecom domain as part of an operations support team to assist in the billing, rating and roaming operations of the client, also handled the deployments in production environment using TFS
- Automated many of the processes required to address ad-hoc provisioning requests from client using UNIX shell scripting and SQL thus reducing the team efforts by 30%
- Worked on setting up an HDP cluster and load client’s billing data from SQL to Hadoop using Sqoop, with a plan of using Hadoop as the Data lake for all the client’s customer, billing data
- Developed oozie workflows and used oozie scheduler to automate the process of importing data from SQL into Hadoop
- Designed Hive databases and tables and tuned them for optimal performance using partitioning and bucketing as required