Hadoop Developer Resume
Danville, PA
SUMMARY
- 7+ years of professional experience which include 3+ years as Software Engineer/ Java Developer and 4+ years in Big Data analytics as Hadoop/Bigdata Developer.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions.
- Experience in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Tez, Pig, Sqoop, Flume, Spark, Oozie, Ranger and Kerberos.
- Experience working with Hive data, extending the Hive library using custom UDF's to query data in non - standard formats.
- Experience in Optimization and performance tuning of Map Reduce, Pig jobs and Hive queries.
- Used Compression Techniques (snappy) with file formats to leverage the storage in HDFS.
- Working knowledge in Hadoop HDFS Admin Shell commands.
- Experience in tuning and troubleshooting performance issues in Hadoop cluster.
- Experience in importing and exporting data using Sqoop from HDFS file system to Relational Database
- Experience in varied platforms like Windows, UNIX, Linux.
- Well acquainted with understanding the user requirements, preparing technical and functional specification document.
- Extensively involved in unit testing and preparing test plans.
- Expertise in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in managing and reviewing Hadoop log files.
- Implemented security to the cluster by integrating Kerberos with AD.
- Worked with Kerberos and integrated it to the Hadoop cluster to make it more strong and secure from unauthorized access.
- Involved in creating Hive tables, loading and analyzing data using hive queries.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Experience using Flume to collect, aggregate and store the weblog data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Migrated Flume with Spark for real time data and developed the spark Streaming Application with java to consume the data from Kafka and push them into Hive.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, efficient joins, Transformations and other during ingestion process itself.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
- Establish, maintain and enforce ETL architecture design principles, techniques, standards and best practices.
- Drive the technical design of ETL reference architecture to ensure high data quality, data integration performance and error recovery/handling.
- Hands on experience in AWS provisioning and AWS services like Security Groups, EBS, EC2, S3, Glacier, Lambda, EMR, CloudWatch, Elastic Container Service (Docker Container).
- Created S3 buckets and managed policies for S3 buckets and utilized S3 Buckets and Glacier for storage, backup and archived in AWS and experience in setting up and maintenance of Auto-Scaling AWS stacks.
- Experience with Amazon AWS cloud creating EC2 instances, security groups, EC2 container services and Amazon elastic block store.
- In-depth understanding of Data Structures and Algorithms.
- Familiar with Java virtual machine (JVM) and multi-threaded processing.
- Experience working with IDEs (IntelliJ, Eclipse) and source control systems like Git.
- Experience developing in an Agile/Scrum environment.
- Experience working with different file formats like Parquet, Json, Sequence files, RC files.
- Familiarity with development best practices such as code reviews and unit testing.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Excellent analytical, problem solving, communication and interpersonal skills, with ability to interact with individuals at all levels.
TECHNICAL SKILLS
Hadoop ecosystem tool’s: Map Reduce, HDFS, YARN, Tez, Pig, Hive, HBase, Sqoop, Zookeeper, Oozie, Hue, Storm, Kafka, Spark, Flume, Ranger, Impala, Kafka, NIFI, Oozie and Kerberos
Spark: Spark Streaming, Spark SQL
Programming Languages: Java, Scala, Python, Shell Scripting
Source Control Tools: GIT, Bitbucket.
Build Tools: Maven, Ant.
Deployment Tools: Puppet, Chef, Ansible
CI & Ticketing Tools: Jenkins, Jira
Monitoring Tools: Ambari, Cloudera Manager, Nagios, Ganglia.
Web Technologies: JavaScript, XML, HTML.
Databases: SQL Server, MYSQL, MongoDB, Cassandra
Operating Systems: Linux (RHEL, Ubuntu), Unix, Windows, Mac.
Other Concepts: OOPS, Data Structures, Algorithms, Software Engineering
PROFESSIONAL EXPERIENCE
Confidential, Danville PA
Hadoop Developer
Responsibilities:
- Configuration, administration, maintenance, performance tuning, monitoring and troubleshooting of Hadoop Cluster using Ambari.
- Work with business requirements to identify and understand source data systems
- Map source system data to data warehouse tables.
- Develop and test ETL processes.
- Define and capture metadata and rules associated with ETL processes.
- Adapt ETL processes to accommodate changes in source systems and new business user requirements. configuring Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Developing shell script that creates a node with all basic network requirements.
- Implement Kerberos for authenticating all the services in Hadoop Cluster.
- Implement Ranger for Authorization of services in Hadoop Cluster.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and Map Reduce.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Real time streaming the data using Spark with Kafka.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scale.
- Worked within the Apache Hadoop framework, utilizing Twitter statistics to ingest the data from a streaming application program interface (API), automate processes by creating Oozie workflows, and draw conclusions about consumer sentiment based on data patterns found using Hive for external client use.
- Worked on building BI reports in Tableau with Spark using SparkSQL.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS using Sqoop and Kafka.
- Experience working on processing unstructured data using Pig.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Environment: shell scripting, CentOS 6.5, SSH, DISTCP, HDFS, Map Reduce, Yarn, HBase, kudu, Pig, Hive, Sqoop, Flume, Zookeeper, Spark, Storm, Hue, Impala, Kafka, Oozie, Ranger, Kerberos, Linux, Talend, Unix, Git, Eclipse, JUnit.
Confidential
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Experienced in loading data from UNIX file system to HDFS.
- Installed and configured Hive and written Hive UDFs.
- Developed job workflow in Oozie to automate the tasks of loading the data into HDFS.
- Responsible for creating Hive tables, loading data and writing Hive queries.
- Effectively involved in creating the partitioned tables in Hive.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Configured Sqoop and developed scripts to extract data from SQL Server into HDFS.
- Expertise in exporting analyzed data to relational databases using Sqoop.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Managing and scheduling Jobs on a Hadoop cluster.
- Involved in defining job flows, managing and reviewing log files.
- Cluster co-ordination services through Zookeeper.
- Performs data modeling and database optimization.
- Understands and implements schemas.
- Interprets and writes complex SQL queries and stored procedures.
- Proactively monitors systems for optimum performance and capacity constraints.
- Applies development skills to complete tasks.
- Responsible for running Hadoop streaming jobs to process terabytes of xml data.
- Gained experience in managing and reviewing Hadoop log files.
Environment: shell scripting, CentOS 6.5, SSH, DISTCP, HDFS, Map Reduce, Yarn, HBase, kudu, Pig, Hive, Sqoop, Flume, Zookeeper, Spark, Storm, Hue, Impala, Kafka, Oozie, Sentry, Kerberos, Linux, Unix, Git, Eclipse, JUnit.
Confidential
Hadoop Consultant
Responsibilities:
- Worked extensively in creating MapReduce jobs using to power data for search and aggregation.
- I have exposure to apache spark like Responsible for migrating the code base from Hortonworks Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift, Dynamo DB.
- Extensively used Pig for data cleansing, Developed Enterprise application using Scala.
- Created partitioned tables in Hive.
- Developed the Pig UDF’S to pre-process the data for analysis.
- Worked with business teams and created Hive queries for ad hoc access.
- Mentored analyst and test team for writing Hive Queries.
- Generated reports and did predictions using BI tool called Tableau, Integrated data by using Talend.
- Building framework for storing and processing input data from various resources.
- Maintaining job status and configuration in relational table(MySQL) for tracking and storing them in HBase.
- Purging records older than business defined days and archiving those records into a file.
- Involved in various life cycle phase from requirement analysis to implementation.
- Performing data analytics to derive profile attributes using business rules.
- Performing data standardization on the successful validated data and storing them in Hive.
- Performing mandatory data and field validation on the incoming feed.
- Worked on code review comments after lead review and fix issues.
- Implemented Oozie for writing workflows and scheduling jobs.
Environment: Hadoop1x,Hive, Pig, HBASE, Sqoop, Linux and Flume
Confidential
Bigdata Developer
Responsibilities:
- Worked on analyzingHadoopstack and different big data analytic tools including Pig andHive, HBase database and Sqoop.
- Implemented nine node Hadoopcluster on Redhat LINUX.
- Worked on installing cluster, commissioning & decommissioning of datanodes, namenode recovery, capacity planning, JVM tuning, map and slots configuration.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented a script to transmit sysprin information from Oracle to HBase using Sqoop.
- Worked onHivefor exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance ofHiveand Pig queries.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig,Hiveand Sqoop.
Environment: Hadoop1x, HDFS,Hive, Pig, Sqoop, HBase, Shell Scripting, Ubuntu, Linux RedHat
Confidential
JAVA Developer
Responsibilities:
- Involved in the complete SDLC software development life cycle of the application from requirement analysis to testing.
- Developed the modules based on struts MVC Architecture.
- Developed UI using JavaScript, JSP, HTML, and CSS for interactive cross browser functionality and complex user interface.
- Created Business Logic using Servlets, Session beans and deployed them on WebLogic server.
- Used MVC struts framework for application design.
- Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
- Prepared the Functional, Design and Test case specifications.
- Involved in writing Stored Procedures in Oracle to do some database side validations.
- Performed unit testing, system testing and integration testing
- Developed Unit Test Cases. Used JUNIT for unit testing of the application.
- Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.
- Used Maven for building the enterprise application modules
- Used Log4J to monitor the error logs
- Used JUnit for unit testing
- Used SVN for Version control.
Environment: Java, JSP, Servlets, WebLogic, Oracle, JUnit, SQL, XML, Maven, Log4J, Junit, SVN.