Sr Hadoop Developer/ Spark Developer Resume
Kansas City, MO
SUMMARY
- Senior Hadoop developer with 8+ years of professional IT experience including 4 years of Big data consultant experience in Hadoop ecosystem components in ingestion, Data modeling, querying, processing, storage, analysis, Data Integration and Implementing enterprise level systems spanning Big data.
- A skilled developer with strong problem solving, debugging and analytical capabilities, who actively engages in understanding customer requirements.
- Result oriented and hands on, skillfully balances between meeting resource and time constraints, while doing it right.
- Strong knowledge on Hadoop ecosystem including MapReduce, Yarn, Hdfs, Hive, Pig, HBase, Sqoop, Oozie, Flume, Mahout, Apache Drill, ZooKeeper, Solr & Lucene and Impala.
- Experienced in collecting metrics for Hadoop clusters usingAmbari & Cloudera Manager.
- Experienced on Yarn environment with Storm, Spark, Kafka and Avro.
- Good knowledge of Hadoop Architecture and various daemons such as Job Tracker, Task Tracker, Name Node, Data Node.
- Experienced with all flavors of Hadoop distributions, including Cloudera, Hortonworks, MapR, Amazon Web Services distribution of Hadoop.
- Having good knowledge of Scala programming concepts.
- Experienced with the Scala, Sparkimproving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, Pair RDD's,Spark YARN.
- Experienced in writing Spark scripts by using Python shell commands as per the requirement.
- Hands on experience in writing MapReduce jobs using Java for the data ingestion and aggregation.
- Hands-on experience in developing MapReduce programs and User defines functions(UDF’s) for Hive and Pig.
- Experienced in using Partitioning and Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Hands on experience in NOSQL databases like HBase, Cassandra, MongoDB.
- Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in Hdfs.
- Extensively used ETL methodology for supporting Data Extraction, transformations and loading using Informatica.
- Exposure to Spark, Hadoop/Mahout etc.
- Familiar in using Apache Drill data-intensive distributed applications for interactive analysis of large-scale datasets.
- Expertise in Hadoop workflows scheduling and monitoring using Oozie, Zookeeper.
- Experienced in distributed messaging queue using Apache Kafka.
- Experienced in creating tables on top of Parquet format in Impala.
- Implemented Storm builder topologies to perform cleansing operations before moving data into HBase.
- Expertise in search technology’s like SOLR & Lucene.
- Proficient in Big data ingestion tools like Flume, Kafka, Spark Streaming and Sqoop for streaming and batch data ingestion.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), Python and Scala.
- Experienced in working with Hadoop/Big-Data storage and analytical frameworks over Amazon AWS cloud using tools like SSH, Putty.
- Good Domain knowledge on Retail, Healthcare and Insurance.
- Built scripts using MAVEN and deployed the application on the JBoss application server.
TECHNICAL SKILLS
Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, Hue, Impala, YARN, Oozie, Zookeeper, MapR, CDH, HDP, Apache Spark, Apache Kafka, Apache STORM, Apache Crunch, Avro, Parquet, Apache NiFi.
Programming Languages: C, Java, Python, Unix, Shell Scripting.
Java Technologies: J2EE, JDBC, JUnit, Log4j.
Web Technologies: HTML, CSS, JavaScript, JQuery, Jsp, Servlet, Ajax.
IDE Development Tools: Eclipse, Net Beans.
Frameworks: MVC, Struts, Hibernate, Spring.
Web Servers: Web Logic, Web Sphere, Apache Tomcat.
Databases: Netezza, SQL Server, MySQL, ORACLE, DB2.
NoSQL Databases: HBase, MongoDB, Cassandra.
Development Methodologies: Waterfall, Agile Methodologies (Scrum).
Operating Systems: Windows, Linux, Unix.
Software Management Technologies: SVN, Git, Maven.
PROFESSIONAL EXPERIENCE
Confidential - Kansas City, MO
Sr Hadoop Developer/ Spark Developer
RESPONSIBILITIES:
- I personally designed and implemented custom NiFi processors that reacted, processed and provided custom detailed reporting for all stages of the pipeline.
- Mastered the ability to design and deploy advanced graphic visualizations with Drill Down and Drop down menu option and Parameterized using Tableau.
- Wrote SAS/SQL ad hoc programs for marketing, financial and technical support departments.
- Applied knowledge/theory to a series of case-studies taken directly from SAS Consulting division, working on small teams, delivering results/findings to stakeholders Migrated Map Reduce jobs into Spark RDD transformations using Scala.
- Developed Spark code using and Spark-SQL/Streaming for faster processing of data.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Spark SQL to load tables into Hdfs to run select queries on top.
- Converted all the processing from Netezza and implemented by using Spark data frames and RDD's.
- Used Sqoop to extract data from various data sources into Hadoop Hdfs. This included data from Excel, ERP systems, databases, CSV and log data from sensors/meters.
- Developed MapReduce programs to cleanse data.
- The Hive tables created as per requirement were internal or external tables defined with proper static and dynamic partitions, intended for efficiency.
- Used Hive data warehouse tool to analyze the unified historic data in Hdfs to identify issues and behavioral patterns.
- Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Implemented Hive custom UDF’s to integrate the Weather and geographical data which produces business data to achieve comprehensive data analysis.
- Created external Hive tables and defined static and dynamic partitions as per requirement for optimized performance on production datasets.
- Used HIVE definition to map the output file to tables.
- All small files will be merged and loaded into HDFS using java code and tracking history related to merge files are maintained in HBASE.
- Used Pig as a ETL tool to do Transformations, even joins and some pre-aggregations before storing data into HDFS and developed Map Reduce program for parsing and loading into HDFS information.
- Processed the output from PIG, Hive and formatted it before sending to the Hadoop output file.
- Used Sqoop to efficiently transfer data between databases and Hdfs and used Flume to stream the log data from servers/sensors.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs.
- Used the RegEx, JSON and Avro for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.
- Worked along with the Hadoop Operations team in Hadoop cluster planning, installation, maintenance, monitoring and upgrades.
- Added methods for performing CRUD operations in applications using JDBC and wrote several SQL queries.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Integration with RDBMS using Sqoop and JDBC Connectors.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
ENVIRONMENT: Hortonworks Hadoop, HDFS, Hive, HQL scripts, Scala, Kafka, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie coordinator, MySQL, Linux.
Confidential, Illinois
Sr. Hadoop Developer
RESPONSIBILITIES:
- Identified the key areas of the solution and parallelized the data loads/processing which cut the runtime by 40%.
- Worked on managing data coming from different sources.
- Worked in a team of 3 offshore and 2 onshore resources in successfully planning, designing and building the solution end to end which enables user driven analytics on top of the dealer data lake residing in hive.
- Cross-functional coordination for Gap Analysis, Cleansing and Integrating data coming from various sources such as Omniture, Natezza, DART, Tube Mogul, Content Management System "The Platform"
- Responsible for managing and scheduling jobs on Hadoop Cluster.
- Hands on experience in database performance tuning and data modeling.
- Actively involved in code review and bug fixing for improving the performance.
- Good experience in handling data manipulation using python Scripts.
- Used IMPALA to pull the data from Hive tables.
- Worked with the key stakeholders of different business groups to identify the core requirements in building the next generation analytic solution using impala as the processing framework and Hadoop for storage on the current dealer data lake.
- Experience in developing Map Reduce Job to transform data and store into HBase, Impala.
- Involved in loading data from UNIX file system to HDFS.
- To analyze data migrated to HDFS, used Hive data warehouse tool and developed Hive queries.
- Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
- Performed research on Hive to analyze the partitioned and bucketed data and compute various metrics to determine the performance on Hadoop cluster.
- Performed data validation, identified and resolved the issues between Oracle and Hadoop and helped the client in retiring oracle which used to host the DDSW solution.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Plan and manage HDFS storage capacity. Advise a team on best tool selection, best practices, and optimal processes using Sqoop, Flume, Pig, Oozie, Hive, and Bash Shell Scripting.
- Used Sqoop to import and export data from Rdbms to Hdfs & vice versa.
- Extracted the data from Teradata into HDFS using the Sqoop
- Involved in SQOOP, HDFS Put or CopyFromLocal to ingest data.
- Experience in DE normalize data design for Impala.
- Experienced in managing Hadoop log files.
- Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
- Implemented automatic failover Zookeeper and zookeeper failover controller.
- Designed ETL flow for several Hadoop Applications.
- Designed and Developed Oozie workflows, integration with Pig.
- Used OOZIE operational services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie
- Implemented daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
- Used Flume to collect, aggregate, and store the log data from different web servers.
- Experience With installing and configuring Distributed Messaging System like Kafka.
- Assessed the current state of the project and architected and developed a new delta functionality using a hash based approach which saved the consumers a lot of time having not to load the entire data set again and just load the delta i.e. the change in the dealer data received.
- Monitored multiple clusters environments using AMBARI.
- Used Git as version control to checkout and check-in of files.
- Very good understanding and knowledge of assigning number of mappers and reducers to Map reduce cluster.
- Implemented a Product Recommendation Service using Mahout
ENVIRONMENT: Hadoop, MapReduce, Hive, HDFS, PIG, Sqoop, Flume, HBase, Spark, Zookeeper AMBARI(Hortonworks), AWS, SQL Server, Teradata, MYSQL, Impala, Python, UNIX, TortoiseGit.
Confidential - Norwalk, CTHadoop Developer
RESPONSIBILITIES:
- Worked on importing and exporting data between DB2 and HDFS using Sqoop.
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices to be pushed into HDFS.
- Developed MapReduce programs in Java to convert data from JSON format to CSV and TSV formats to perform analytics.
- Developed Pig Latin scripts for cleansing and analysis of semi-structured data.
- Experienced in debugging MapReduce jobs and Pig scripts.
- Used Pig as ETL tool to do transformations, event joins and pre-aggregations before storing the data into HDFS.
- Experience in creating Hive tables, loading with data and writing hive queries.
- Experience in migration of ETL processes from Relational databases to Hive to test the easy data manipulation.
- Used Hive to analyze the partitioned and bucketed data to compute various metrics for reporting.
- Written Hive and Pig UDFs to perform aggregation to support the business use case.
- Performed MapReduce integration to import large amounts of data into HBase.
- Experience with performing CRUD operations using HBase Java client API.
- Developed shell scripts to automate MapReduce jobs to process data. using apache Drill for low latency sub second queries
- Developed Enterprise Lucene/Solr based solutions to include custom type/object modelling and implementation into the Lucene/Solr analysis (Tokenizers/Filters) pipeline.
ENVIRONMENT: Cloudera Manager, Java, shell, SQL, Hadoop, HDFS, Sqoop, Flume, MapReduce, Pig, Hive, Oracle, MongoDB, HBase, JDk, Agile SCRUM.
Confidential
Java Developer
RESPONSIBILITIES:
- Developed Spring Container, Controller classes, Spring Configuration XML file.
- Implemented Spring Controller layer with dependency wiring, transaction for claims transaction.
- Used Spring MVC to implement MVC Design Patterns.
- Used Hibernate for mapping claim data by connecting to Oracle database.
- Implemented spring java based SOAP Web Services for payment authorization and JUnit tests for part of my code.
- Used JAXB for converting data from java objects to xml file and vice versa.
- Developed the web services stubs provided WSDL using Apache Axis.
- Designing and coding for grouping all diagnosis codes and procedure code in mediators to generate accurate adjudication using elements characteristic of more Java/J2EE to add overall support for more project functions.
- Applying ample use of HTML, CSS, JavaScript, JQuery support for creating and maintaining user interface side.
- Used Maven for building the project.
- Used Service Oriented Architecture to create different services and implementing functionality using different queues and Oracle Bus for communicating.
- Involved in deploying application using Web Logic.
- Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, modeling, analysis, architecture & development and project was developed using Agile Methodologies.
- Involved in SCRUM Meetings, had done sprint planning every two weeks and setup daily stand up meeting to monitor the status.
- Used SVN as sub version control across common source code used by developers.
ENVIRONMENT: Java, J2EE (JSPs & Servlets), JUnit, LINUX, HTML, CSS, JavaScript, Apache Tomcat, MySQL.
Confidential
Java Developer
RESPONSIBILITIES:
- JSP pages designed using struts tag libraries, HTML, DHTML, JSP, AJAX and Java Script.
- Used Hibernate for establishing connection and interacting with the database.
- Created connections to database using Hibernate Session Factory, using Hibernate APIs to retrieve and store images to the database with Hibernate transaction control.
- The front-end JSP pages were developed using the Struts framework, and were hosted in a J2EE.
- Integrated the application with Struts Validation framework to do business validations.
- Involved in the development of CRUD (Create, Update and Delete) functionality for various administrative system related tables and product components.
- Worked with QA team to validate the test cases whether meeting the business requirements or not.
- Conducted Unit Testing, interface testing, system testing and user acceptance testing.
- Developed the presentation layer written using JSP, HTML, CSS and client-side validations were done using JavaScript, jQuery, and JSON.
- Designed additional UI Components using Java Script and implemented an asynchronous, AJAX based rich client to improve customer experience.
- Designed static and dynamic Web Pages using JSP, HTML, CSS and SASS.
- Developed application using spring frame work.
- Updated the maintained the sequence diagrams for the given Design.
- Used Web Logic Application Server for deploying various components of application.
- Developed the User Interface Screens for presentation logic using JSP, CSS, and HTML client validation scripts using JavaScript.
ENVIRONMENT: Java, J2EE (JSPs & Servlets), JUnit, LINUX, HTML, CSS, JavaScript, Apache Tomcat, MySQL.