Senior Hadoop Developer Resume
Buffalo, NY
SUMMARY
- Hadoop Developer with almost 8 years of IT experience in developing, delivering of software using wide variety of technologies in all phases of the development life cycle.
- Expertise in Java /Big data technologies as an engineer, proven ability in project based leadership, teamwork and good communication skills.
- Very Good Knowledge in Object - oriented concepts with complete software development life cycle experience - Requirements gathering, Conceptual Design, Analysis, Detail design, Development, Mentoring, System and User Acceptance Testing.
- Hands-on development and implementation experience in Big Data Management Platform (BMP) using HDFS, Map Reduce, Hive, Pig, Oozie, Apache Kite and other Hadoop related eco-systems as a Data Storage and Retrieval systems.
- Experience in Data transfer from structured data stores to HDFS using Sqoop.
- Worked on writing Map Reduce programs to perform Data processing and analysis.
- Experience in analyzing data with Hive and Pig.
- Worked on Oozie for managing Hadoop jobs.
- Experience in cluster coordination using Zookeeper.
- Experience in loading logs from multiple sources directly into HDFS using Flume.
- Developed Batch Processing jobs using Java Map Reduce, Pig and Hive.
- Good Knowledge and experience in Hadoop Administration.
- Experience in Installation, Configuration, Testing, Backup, Recovery, Customizing and Maintenance.
- Experience in using Flume to load logs files into HDFS.
- Expertise in using Oozie for configuring job flows.
- Expert level skills in programming with Struts Framework, Custom Tag Libraries, Spring tag Libraries and JSTL.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Hadoop security and access controls (Kerberos, Active Directory)
- Hadoop cluster integration with Nagios and Ganglia.
- Experience in Installing, Upgrading and Configuring Redhat Linux 3.x, 4.x, 5.x using Kickstart Servers and Interactive Installation.
- Strong experience in RDBMS technologies like MySQL, Oracle and Teradata
- Expertise in implementation and designing of disaster recovery plan for Hadoop cluster
- Experience in scripting for automation, and monitoring using Shell & Perl scripts.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities likeSpark.
- Enhanced and optimized productSparkcode to aggregate, group and run data mining tasks using theSparkframework.
- Experience on Puppet and Chef
- Experience in development of logging standards and mechanism based on Log4J.
- Good understanding of server hardware and hardware Architecture.
- Experience in Building, Deploying and Integrating with Ant, Maven
- Have extensive experience in building and deploying applications on Web/Application Servers like Weblogic, Websphere, and Tomcat.
- Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.
- Hands on experience in Agile and Scrum methodologies.
- Extensively development experience in different IDE’s like Eclipse, NetBeans, Forte and STS.
- Expertise in relational databases like Oracle, My SQL.
- Extensive experience in working with the Customers to gather required information to analyze, clear up and provide data fix or code fix for technical problems, build service patch for each version release and unit testing, integration testing, User Acceptance testing and system testing and providing Technical Solution documents for the Users
TECHNICAL SKILLS
Hadoop/Big Data Technologies: HDFS, MapReduce, Hive, Pig, Impala, Sqoop, Flume, HBase, Spark, Cassandra, Oozie, Zookeeper, YARN
Programming Languages: Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), C/C++, Matlab, Python, R, HTML, SQL, PL/SQL
Frameworks: Hibernate 2.x/3.x, Spring 2.x/3.x, Struts1.x/2.x and JPA
Web Services: WSDL, SOAP, Apache CXF/XFire, Apache, Axis, REST, Jersey
Client Technologies: JQUERY, Java Script, AJAX, CSS, HTML, XHTML
Operating Systems: UNIX, Windows, LINUX
Application Servers: IBM Web sphere, Tomcat, Web Logic, WebSphere
Web technologies: JSP, Servlets, Socket Programming, JNDI,JDBC, Java Beans, JavaScript, WebServices (JAX-WS)
Databases: NoSQL, Oracle 8i/9i/10g, Microsoft SQL Server 2008/2012, DB2 & MySQL 4.x/5.x
Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0
Tools: TOAD, SQL Developer, SOAP UI, ANT, Maven, Visio, Rational Rose
PROFESSIONAL EXPERIENCE
Confidential, Buffalo NY
Senior Hadoop Developer
Responsibilities:
- Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of Hadoop cluster.
- Experience with Agile development processes and practices.
- Extensively worked on Oozie and Unix scripts for batch processing and scheduling workflows dynamically.
- Implemented data ingestion from multiple sources like IBM Mainframes, Teradata, Oracle and Netezza using Sqoop, SFTP and MR jobs.
- Developed Sqoop scripts to import and export data from relational sources and handled incremental and updated changes into HDFS layer.
- Developed transformations and aggregated the data for large data sets using MR, Pig and Hive scripts.
- Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
- Developed test cases in JUnit for unit testing of MR Jobs.
- Explored different BI reporting tools to compare which one best suits the requirements.
- Worked on JDBC connectivity using Squirrel, ODBC connection setups with Microstrategy and Toad, Stress testing Hadoop data for BI reporting tools.
- Implemented a process to automatically update the Hive tables by reading a change file provided by business users.
- Experienced working with different file formats - Avro, Parquet and JSON.
- Experience in using Gzip, LZO, Snappy and Bzip2 compressions.
- Experience in reading and writing files into HDFS using Java file system API.
- Developed Pig and Hive UDF's based on requirements.
- Developed the workflow jobs using Oozie services to run the MR, Pig and Hive jobs and created JIL scripts to run Oozie jobs.
- Improved performance using advanced joins in Apache Pig and Apache Hive.
- Tuning MapReduce job parameters and configuration parameters to improve performance.
- Data copy between production and lower environments.
- Created reporting views in Impala using Sentry Policy files.
Environment: Cloudera Hadoop, Hortonworks Hadoop, HDFS, Hive, Pig, Mapreduce, Oozie, Flume, Sqoop, Splunk, Informatica Big Data, Pentaho Kettle, Talend, GreenPlum, Platfora, Tableau, Teradata, UNIX, Shell Scripting. Kerberos security
Confidential, New York, NY
Sr Hadoop Developer
Responsibilities:
- Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Established custom MapReduces programs in order to analyze data and used Pig Latin to clean unwanted data
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications
- Used Zookeeper to manage coordination among the clusters
- Experienced in analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files
Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, HDFS, LINUX, Oozie, Cassandra, Hue, HCatalog, Java, Eclipse, VSS, Red Hat Linux.
Confidential, Winston Salem, NC
Java/Hadoop Engineer
Responsibilities:
- Developed Map Reduce jobs in java for data cleansing and preprocessing.
- Moving data from Oracle to HDFS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented partitioning, bucketing in Hive for better organization of the data.
- Worked with different file formats and compression techniques to determine standards
- Developed Hive queries and UDFS to analyze/transform the data in HDFS.
- Developed Hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Developed Pig scripts and UDF’s as per the Business logic.
- Analyzing/Transforming data with Hive and Pig.
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Designed and developed read lock capability in HDFS.
- Implemented Hadoop Float equivalent to the Oracle Decimal.
- Involved in End to End implementation of ETL logic.
- Effective coordination with offshore team and managed project deliverable on time.
- Worked on QA support activities, test data creation and Unit testing activities.
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on Hadoop Application testing.
- Worked on Installing 20 node UAT Hadoop cluster.
Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, HDFS, LINUX, Oozie, Cassandra, Hue, HCatalog, Java, Eclipse, VSS, Red Hat Linux.
Confidential, Austin Tx
Java Developer
Responsibilities:
- Analysis, Design, Project Planning and effort estimate and Development of FTM application based on -MVC using Struts Framework and server-side J2EE technologies.
- Part of the core agile team in developing the application in Agile Development Methodology.
- Involved in mentoring team in technical discussions and Technical review of Design Documents.
- Hands on Code development by using Core java, servlet and Hibernate framework's API.
- Used Hibernate to develop persistent classes following ORM principles.
- Developed Hibernate configuration files for establishing data base connection and Hibernate mapping files based on POJO classes.
- Developed JUNIT test cases and System test cases for all the developed modules and classes, use Jmeter for performance test.
- Used SVN for source control.
- Used Maven for product lifecycle management.
- Involved in code reviews and verifying bug analysis reports
- Created the PL/SQL stored procedure, function, triggers for the Oracle 11g database.
- Used Eclipse Juno as the IDE and Tomcat 6.0/ 7.0 as the application server.
Environment: Java, J2EE 1.5, Struts 1.3, Hibernate 3.0, JSP, Servlets, XML, Tomcat 6.0/7.0, JDBC, Oracle SQL Developer, Oracle 11.2.0, JQuery