Big Data Hadoop Developer Resume
Columbus, OH
SUMMARY
- 6 years of Experience in Analysis, design, development and implementation of Java/J2EE applications and Hadoop Technology stack hands on experience working with HDFS, MapReduce framework and Hadoop ecosystem like Pig, Hive, Spark, HBase, Sqoop, Flume, Zookeeper and Oozie,, Solr, Git, Maven, AVRO, JSON and CHEF, impala, Falcon, Ranger, Ambari, Scala, shell scripting.
- Hands on experience in writing Pig scripts using Pig Latin to perform ETL operations.
- Hands on experience in performing real time analytics on big data using HBase and Cassandra.
- Experience in using Flume to stream data into HDFS.
- Experience with Oozie Workflow engine in running workflow jobs with actions that run Hadoop MapReduce and Pig jobs
- Good practical understanding of cloud infrastructure like Amazon Web Services (AWS)
- Experienced with configuring, monitoring large clusters using different distributions like Cloudera, Horton work.
- Monitored multiple Hadoop clusters environments using Cloudera Manager and Ganglia
- Experience in Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).
- Extensive experience in middle - tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSP, JSF, Struts, Spring, Hibernate, JDBC, EJB.
- Expertise in loading Data into HDFS and HBase using Map Reduce jobs.
- Experience in processing huge volume of structured, semi-structured, unstructured data using MapReduce.
- Extensive knowledge with Hadoop distributions Cloudera CDH 3, 4& 5 and Hortonworks.
- Hands on experience in working with different file formats such as Text, JSON, XML, Avro, Parquet and Sequence files.
- Having good amount of experience in importing and exporting data in Hadoop from various RDBMS using Sqoop.
- In-depth understanding of Data Structure and Algorithms in Java.
- Hands on experience on NoSQL databases like HBase and Cassandra.
- Experience in methodologies such as Agile, Scrum and Test driven development.
- Good Knowledge on Storm, Spark, Impala and Yarn.
- Experience in Object Oriented Analysis and Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns, Java web framework and Core Java design patterns with special emphasis on multi-threading, concurrency and scheduling
- Experience in using IDEs like Eclipse, Net Beans.
- Implemented data ingestion and handling clusters in real time processing using Kafka.
- Real time streaming the data using Spark with Kafka.
- Importing and exporting data into HDFS using Sqoop and Kafka.
- Development experience in DBMS like Oracle, MS SQL Server, Teradata and MYSQL.
- Support development, testing, and operations teams during new system deployments.
- Evaluate and propose new tools and technologies to meet the needs of the organization.
- An excellent team player and self-starter with good communication skills and proven abilities to finish tasks before target deadlines.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Impala, Pentaho
Operating Systems: Windows, Ubuntu, RedHat Linux, Unix
Programming/ Scripting Languages: Java, Python, Unix shell scripting, JavaScript
Databases/Database Languages: MS-Access 2010, Microsoft SQL Server, Oracle 9i/11g, MySQL 5.6, NoSQL (HBase), SQL, PL/SQL
Web Technologies: JSP, Servlet, JSF, Java Beans, EJB, SOAP,WSDL, XML, HTML, CSS
Web Servers: Apache Tomcat 6, Web Logic
ETL: Pentaho
Frameworks: J2EE, Hibernate, Spring, Apache Maven, Struts
IDEs: Eclipse, Oracle Developer
Monitoring and Reporting: Ganglia, Custom shell scripts
Testing tools: JUnit, MRUnit
Statistical Programming Tool/Predictive Modelling Tool: R, Weka, Rapid Miner
PROFESSIONAL EXPERIENCE
Confidential, Columbus, OH
Big Data Hadoop Developer
Responsibilities:
- Involved in Installing, Configuring Hadoop Eco Systemand Cloudera Manager using CDH4 Distribution.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Integrated scheduler with Oozie work flows to get data from multiple data sources parallel using fork.
- Created Data Pipeline of MapReduce programs using Chained Mappers.
- Implemented Optimized join base by joining.
- Usage of Sqoop to import data into HDFS from MySQL database and vice-versa.
- Responsible for importing log files from various sources into HDFS using Flume.
- Created customized BI tool for manager team that perform Query analytics using HiveQL.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Optimizing the Hive queries using Partitioning and Bucketing techniques for controlling the data.
- Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Spark, AWS EC2, S3, RDS, Kafka, Solr, LINUX, Cloudera, Big Data, Java collection, Python, SQL, NoSQL, Informatica, Cassandra, Talend, Machine Learning, HBase.
Confidential, Columbus, OH
Big Data Hadoop Developer
Responsibilities:
- Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper.
- Implemented six nodes CDH4 Hadoop Cluster on CentOS.
- Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop.
- Experienced in defining job flows to run multiple Map Reduce and Pig jobs using Oozie.
- Importing log files using Flume into HDFS and load into Hive tables to query data.
- Monitoring the runningMap Reduceprograms on the cluster.
- Responsible for loading data from UNIX file systems to HDFS.
- Used HBase-Hive integration, written multiple Hive UDFs for complex queries.
- Involved in writing APIs to ReadHBasetables cleanse data and write to anotherHBasetable.
- Created multiple Hive tables implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
- Written multiple Map Reduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements.
- Experienced in writing programs using HBase Client API.
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Experienced in design, development, tuning and maintenance of NoSQL database.
- Developed unit test cases for Hadoop Map Reduce jobs with MRUnit.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Spark, AWS EC2, S3, RDS, Kafka, Solr, LINUX, Cloudera, Big Data, Java collection, Python, SQL, NoSQL, Informatica, Cassandra, Talend, Machine Learning, HBase.
Confidential
ETL Developer
Responsibilities:
- Involved in business analysis and technical design sessions with business and technical staff to develop requirements document and ETL design specifications.
- Worked with various relational sources and flat files to populate the data mart.
- Extract data from various source systems like flat files and Oracle databases and to schedule the workflows.
- Created stored procedures to extract data from flat files.
- Defined Target Load Order Plan and Constraint based loading for loading data correctly into different Target Tables.
- Used Debugger to test the mappings and fix the bugs.
- Involved in Performance Tuning of ETL jobs by tuning the SQL used in Transformations.
- Expert in Creating, Configuring and Fine-tuning ETL workflows designed in DTS and MS SQL Server Integration Services (SSIS).
- UsedSSIS Packagesfor extracting, cleaning, transforming and loading data intodata warehouse.
- Experiencein Event Logging, Resource management, Log files, File and File Groups, Security, Backup and Recovery
- Worked extensively on SQL, PL/SQL and UNIX Shell Scripts.
- Involved in Unit Testing, Integration and User Acceptance Testing of mappings.
- Created high-level Technical Design Documents.
Environment: Informatica Power Center 8.6.1, Oracle 9i/11g, SQL Developer, Mainframes(VSAM Files), UNIX, Windows XP, PL/SQL, HP Quality Center, Micro strategy, FTP Autosys .
Confidential
ETL/Java Developer
Responsibilities:
- Involved in debugging, testing and implementation of the System.
- Responsible for database (Oracle) connectivity and persistence using JDBC.
- Designed and developed user interfaces using JSP, Java script, HTML and Struts framework.
- Involved in Database design and developing SQL Queries, stored procedures on MySQL.
- Developed Action Forms and Action Classes in Struts frame work.
- Programmed session and entity EJBs to handle user info track and profile based transactions.
- Involved in writing JUnit test cases, unit and integration testing of the application.
- Developed user and technical documentation.
- Analyzed the Business requirements and prepare the software requirements.
- Worked on development of Front end web applications using Struts Framework, Struts JSP Tag Libraries, Struts tiles,
- Developed web pages using HTML, DHTML, CSS and implemented client side validations using Java Script and AJAX.
- Interacted with the users, Business Analysts for collecting and understanding the business requirements.
- Designed ETL process as per the requirements and documented ETL process using MS VISIO. Demonstrated the ETL process Design with Technical Lead. Prepared the design guidelines for ETL process development using Informatica Power center, PL/SQL, SQL and Shell scripts.
- Extensively worked on Informatica Power Center Designer (Source Analyzer, Transformation Developer, Mapping Designer and applet Designer).
- Responsible for developing complex Informatica mappings using different types of transformations like Connected and unconnected LOOKUP, Router, Filter, Joiner, Aggregator, Expression and Update strategy transformations for large volumes of data.
- Responsible for Defining Mapping parameters and variables and Session parameters according to the requirements and performance related issues.
- Involved in production support and responsible for tuning the Informatica mappings to increase the performance.
- Responsible for code migration and scheduling jobs in Production.
- Involved in Data Cleansing, Data Profiling and created profile reports for business users to review.
- Troubleshooting database, workflows, mappings, source and target to find out the bottlenecks and improved performance
Environment: JAVA multithreading, collections, J2EE, EJB, UML, SQL, PHP, Sybase, Eclipse, JavaScript, WebSphere, JBOSS, HTML5, DHTML, CSS, XML, Log4j, ANT, STRUTS 1.3.8, JUNIT, JSP, Servlets, Rational Rose, Hibernate.