Sr. Big Data/ Hadoop Developer Resume
Houston, TX
SUMMARY
- Over 8+ years of professional IT experience including 4+ years in Big data ecosystem related technologies. Expertise in Big Data technologies as consultant, proven capability in project - based teamwork and also as an individual developerwith good communication skills.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experience in working with Hadoop clusters using AWS EMR, Cloudera, Pivotal and Horton Works Distributions.
- Hands on experience in installing, configuring, and using Hadoopecosystem components like HadoopMap Reduce(MR), HDFS, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Hands on development and implementation experience in Big Data Management Platform (BMP) using HDFS, Map Reduce, Hive, Pig, Oozie, and other Hadooprelated eco-systems(Kafka, Spark etc) as a Data Storage and Retrieval systems.
- Worked with the Search relevancy team to improve relevancy and ranking of search results using SOLR
- Performed importing and exporting data into HDFS and Hive using Sqoop
- Experience in managing and reviewing Hadooplog files.
- Experience in analyzing/troubleshooting from production logs to generate performance load
- Experience in analyzing data using HiveQL, Pig Latin, HBase and custom Map Reduce programs in Java.
- Experience developing debugging and testing C# scripts. C# will be used to perform the following activities
- Having experience in Performing REST / SOA
- Having experience in Azure Data Factory activities,
- Having experience in PowerShell Scripting
- Experience parsing JSON, XML and semi-structured data files.
- Extending Hive and Pig core functionality by writing UDFs.
- Analyze a MapReduce job and determine input and output data paths are handled in CDH.
- Good experience installing, configuring, testing Hadoop ecosystem components.
- Well-experienced Mapper, Reducer, Combiner, Partitioner, Shuffling and Sort process along with Custom Partitioning for efficient Bucketing.
- Experience parsing JSON, XML and semi-structured data files
- Good experience in writing PIG and Hive UDF’s according to requirements.
- Experience in MAPR to develop, build and develop HADOOP based application.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Having good Experience in Hortonworks 2.x.
- Conducted on-site POC and Pilot for IBM OPTIM product suite for data discovery, sub-setting & data masking.
- Experienced in implementing POC's to migrate iterative map reduce programs into Spark transformations using Scala
- Experience composing, orchestrating, and monitory Data Factory activities and pipelines.
- Hands on experience in Agile and Scrum methodologies.
- Extensive experience in working with the Customers to gather required information to analyze, provide data fix or code fix for technical problems, and providing Technical Solution documents for the users.
- Hands on experience in application development using Java, RDBMS(ORACLE, SQL SERVER), and Linux shell scripting.
- Having experience in developing debugging and testing C# scripts
- Worked on multiple stages of Software Development Life Cycle including Development, Component Integration, Performance Testing, Deployment and Support Maintenance.
- Have flair to adapt to new software applications and products, self-starter, have excellent communication skills and good understanding of business work flow.
- Expertise in Object-oriented analysis and programming (OOAD) like UML and use of various design patterns
- Working knowledge in SQL, PL/SQL, Stored Procedures, Functions, Packages, DB Triggers and Indexes.
- Having experience in HANA DB and ETL
- Good experience in designing the jobs and transformations and load the data sequentially & parallel for initial and incremental loads.
TECHNICAL SKILLS
Big Data: Hadoop, MapReduce, HDFS, Hive with Tez, Pig, Sqoop, Oozie, Zookeeper, Flum Apache Mahout, AWS, YARN, Storm, Kafka, Spark, Impala, Python, Solr and HBase MongoDB and Cassandra.
Languages: C, C++, Java, SQL, PL/SQL, UMLDatabases Oracle 8i/9i/10g/11g, SQL Server 7.0 /2000, DB2, MS Access
Technologies: Java 5, Java 6, AJAX, Log4j, Java Help, Java API, JDBC 2.0, and Java Beans
Methodologies: CMMI, Agile Software development, Six Sigma, Quantitative, Project Management, UML, Design Patterns
Framework: Ajax, Struts 2.0, JUnit, log4j 1.2, MOCK OBJECTS, Hibernate
Application Server: Apache Tomcat 5.x 6.0, JBOSS 4.0
Tools: HTML, Java Script, XMLTesting Tools NetBeans, Eclipse, WSAD, RAD
Operating System: UNIX, Mac OSX, Windows Hyper V
Control tools: CVS, Tortoise SVN
Others: MS Office
PROFESSIONAL EXPERIENCE
Confidential, HOUSTON TX
SR. BIG DATA/ HADOOP DEVELOPER
Responsibilities:
- Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Responsible for building scalable distributed data solutions using Hadoop.
- Work closely with various levels of individuals to coordinate and prioritize multiple projects. Estimate scope, schedule and track projects throughout SDLC.
- Worked in the BI team in the area of Big Data Hadoopcluster implementation and data integration in developing large-scale system software.
- Worked in HadoopMap Reduce, HDFS Developed multiple Map Reduce jobs in java for data cleaning and processing.
- Worked extensively in creating Map Reduce jobs to power data for search and aggregation.
- Experienced in managing and reviewing Hadooplog files.
- Experienced in running Hadoopstreaming jobs to process terabytes data
- Designed a data warehouse using Hive
- Handling structured, semi structured and unstructured data
- Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems and vice-versa.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Pig for data cleansing.
- Created partitioned tables in Hive.
- Managed and reviewed Hadoop log files.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Responsible to manage data coming from different sources
- Extensively used Pig for data cleansing.
- Created partitioned tables in Hive.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Developed the Pig UDF'S to pre-process the data for analysis.
- Develop Hive queries for the analyst
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Mentored analyst and test team for writing Hive Queries.
- Involved in the database migrations to transfer data from one database to other and complete virtualization of many client applications
- Supports and assist QA Engineers in understanding, testing and troubleshooting.
- Written build scripts using ant/maven and participated in the deployment of one or more production systems
- Production Rollout Support that includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
- Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
Confidential, Baltimore, MD
BIG DATA/ HADOOP DEVELOPER
Responsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
- Experience with Big Data Analytics implementations, using Hadoop or Cloudera, Map Reduce or Horton works.
- Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Performed cluster co-ordination and assisted with data capacity planning and node forecasting using ZooKeeper.
- Extracted data from Oracle SQL server and MySQL databases to HDFS using Sqoop.
- Worked on writing and optimizing MapReduce jobs.
- Experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
- Created Hive tables to store the processed results in a tabular format and written Hive scripts to transform and aggregate the disparate data.
- Automated the process for extraction of data from warehouses and weblogs into HIVE tables by developing workflows and coordinator jobs in Oozie.
- Transferred data from Hive tables to HBase via stage tables using Pig and used Impala for interactive querying of HBase tables.
- Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Responsible for cluster maintenance, rebalancing blocks, commissioning and decommissioning of nodes, monitoring and troubleshooting, manage and review data backups and log files.
- Exported the aggregated data onto RDBMS using Sqoop for creating dashboards in the Tableau and developed trend analysis using statistical features.
- Responsible for building scalable distributed data solutions on a cluster using Cloudera Distribution.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Developed Sqoop scripts to import and export data from MySQL and handled incremental and updated changes into HDFS layer.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing unnecessary information or merging many small files into large, compressed files using pig pipelines in the data preparation stage.
- Created Hive tables and loaded the data into tables to query using HiveQL.
- Implemented partitioning and bucketing in HIVE tables and executed the scripts in parallel to improve the performance.
- Created HBase tables to store various data formats as input coming from different sources.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using Java API.
- Utilized Agile Scrum Methodology to manage and organize the team with regular code review sessions.
Confidential, Huston, TX
BIG DATA/ HADOOP DEVELOPER
Responsibilities:
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Designed and developed Oozie workflows for automating jobs.
- Mainly working on handling of Big Data Analytics and infrastructure of Hadoop, MapReduce
- Got good experience with NoSQL database.
- Performed Map Reduce Programs those are running on the cluster.
- Installed and configured Hive and also written Hive UDFs.
- Created HBase tables to store variable data formats of data coming from different portfolios.
- Implemented best income logic using Pig scripts.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the Business Intelligence (BI) team.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Writing HadoopMR programs to get the logs and feed into Cassandra for Analytics purpose
- Moving data from Oracle to HDFS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked with different file formats and compression techniques to determine standards
- Developed Hive queries and UDF's to analyze/transform the data in HDFS.
- Developed Hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Developed Pig scripts and UDF's as per the Business logic.
- Importing log files using Flume into HDFS and load into Hive tables to query data.
- Developed pig scripts to convert the data from Avro to Text file format.
- Developed hive scripts for implementing control tables logic in HDFS.
- Developed Sqoop commands to pull the data from Teradata.
- Analyzing/Transforming data with Hive and Pig.
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Designed and developed read lock capability in HDFS.
- Involved in End-to-End implementation of ETL logic.
- Effective coordination with offshore team and managed project deliverable on time.
- Worked on QA support activities, test data creation and Unit testing activities.
Confidential
JAVA/J2EE DEVELOPER
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, Design, Analysis and Code development.
- Prepared Use Cases, sequence diagrams, class diagrams and deployment diagrams based on UML to enforce Rational Unified Process using Rational Rose.
- Extensively worked on user interface for few modules using HTML, JSP's, and JavaScript.
- Generated Business Logic using servlets, Session beans and deployed them on Web logic server.
- Created complex SQL queries and stored procedures.
- Used Hibernate ORM framework with spring framework for data persistence and transaction management.
- Wrote test cases in JUnit for unit testing of classes.
- Provided technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects.
- Built and deployed Java application into multiple Unix based environments and produced both unit and functional test results along with release notes.
- Analyzed the banking and existing system requirements and validated them to suit J2EE architecture.
- Designed the process flow between front-end and server side components
- Developed and implemented the MVC Architectural Pattern using Struts Framework including JSP, Servlets, EJB, Form Bean and Action classes.
- Developed web based presentation-using JSP, AJAX using Servlets technologies and implemented using struts framework.
- Designed and developed backend java Components residing on different machines to exchange information and data using JMS.
- Involved in creating the Hibernate POJO Objects and mapped using Hibernate Annotations.
- Used JavaScript for client-side validation and Struts Validator Framework for form validations.
- Implemented Java/J2EE Design patterns like Business Delegate and Data Transfer Object (DTO), Data Access Object.
- Written Junit Test cases for performing unit testing.
- Integrated Spring DAO for data access using Hibernate, used HQL and SQL for querying databases.
- Worked with QA team for testing and resolving defects.
- Used ANT automated build scripts to compile and package the application.
- Used JIRA for bug tracking and project management.