Hadoop Developer Resume Charlotte - Hire IT People

SUMMARY:

Hadoop consultant with about 6 years of software development experience, involved in different phases of software development lifecycle while working in different projects
About 4 years of experience in working with Hadoop ecosystem components such as HDFS, YARN, Hive, Spark, Sqoop, Oozie, HBase, Phoenix, NiFi
Experience in installing and configuring Hadoop cluster with abilities to provide recommended hardware configuration to obtain best performance depending on use cases
Experience in developing and scheduling automated ETL framework to import data into Hadoop from external database systems such as SQL, Oracle, DB2 using Oozie workflows that involve Sqoop, Hive and Spark actions
Designed and developed applications that leverage NoSQL databases such as HBase/Phoenix as a backend data store
Ability to develop spark applications using pyspark and sparksql, also good understanding on tuning the required parameters to get the best performance on a spark job
Experience of deploying several data science models on to Hadoop platform and operationalizing them by developing the required framework using pyspark jobs, shell scripts, phoenix and hive tables
Basic understanding of object detection algorithms using tensorflow and ability to perform transfer learning to develop new object detection models out of existing models using new set of images for training
Ability to tune several components across the cluster for better performance by identifying optimal values to be set for different parameters based on the cluster configuration
Experience in setting up data HA and DR for the Hadoop clusters by setting up backup and replication of data across different clusters using DistCP, HDFS snapshots for HDFS data and HBase native replication for HBase data
Basic understanding of NiFi, Kafka and experience of setting up few basic NiFi flows for real time streaming
Experience in working with structured and unstructured data such as image files, GIS data, sensor data (.tdms files, .nc files etc)
Experience working in Agile environments and experience in using multiple code versioning tools like Git, Bitbucket

TECHNICAL SKILLS:

Big Data: Hadoop, Hive, HBase, Sqoop, Oozie, Spark, Phoenix, NiFi, Kafka, Ambari, Druid

Security: Ranger, Kerberos, Ranger KMS

Programming: Python, pyspark, shell scripting, scala

Databases: SQL server, MySQL, Oracle, DB2

Data Science Algorithms: Logistic Regression, LSTM, Object detection using Faster RCNN and Mobilenet SSD

PROFESSIONAL EXPERIENCE:

Hadoop Developer

Confidential, Charlotte

Developed ETL workflows using Oozie and SparkSQL to import all the required data from RDMS systems into Hadoop
Worked closely with data scientists and performed data engineering tasks by transforming the imported data, using Hive queries and SparkSQL capabilities to generate materialized views or parquet files that contain data as required by the data scientists
Created Hive and Spark UDFS as and when needed to perform required transformations on data
Converted the data science model developed by data scientists from python to pyspark and operationalized the model using python and shell scripts to automate the process of running model on new data as required and save the results to final Phoenix tables
Worked with complex data types such as BLOB, imported them from Oracle database and unparsed them using python unstruct package to store them as individual records in Hive table
Worked with UI and backend developers by providing them the required details that facilitate the front - end application to retrieve data from final Phoenix table using API calls and display on the UI
Installed and configured five different Hadoop clusters with about 80 nodes using Hortonworks HDP 2.4.3 on RHEL 7.2 servers, also configured security on the cluster using AD integrated Kerberos Authentication and Ranger for authorization
Configured High Availability for different components such as HDFS, YARN, Hive and HBase, also worked on setting up Disaster recovery for the production cluster to switch to DR cluster if needed
Performed necessary tuning on the clusters by setting up correct values for several parameters that impact the cluster performance
Installed Hive LLAP and configured Hive Interactive Server for faster query response
Upgraded HDP from 2.4.3 to 2.5.2 and then to 2.6.3 on all the clusters
Set up YARN queues, HDFS quotas to efficiently manage cluster resources across several users and projects
Developed and scheduled jobs to copy HDFS data from production to DR cluster using DistCp
Commissioned and decommissioned nodes from the clusters whenever required
Installed and configured multiple third-party tools such as Jupyterhub and Rstudio on the Hadoop clusters, also installed anaconda to maintain multiple python instances on each cluster
Developed python and shell scripts to automate ad-hoc tasks to be performed on the clusters such as file system checks, service status checks etc.

Systems Engineer

Confidential

Worked in telecom domain as part of an operations support team to assist in the billing, rating and roaming operations of the client, also handled the deployments in production environment using TFS
Automated many of the processes required to address ad-hoc provisioning requests from client using UNIX shell scripting and SQL thus reducing the team efforts by 30%
Worked on setting up an HDP cluster and load client’s billing data from SQL to Hadoop using Sqoop, with a plan of using Hadoop as the Data lake for all the client’s customer, billing data
Developed oozie workflows and used oozie scheduler to automate the process of importing data from SQL into Hadoop
Designed Hive databases and tables and tuned them for optimal performance using partitioning and bucketing as required

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

CharlottE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship