Sr. Hadoop Developer/data Engineer Resume
Seattle, WA
SUMMARY:
- Senior Software Engineer having 10 years of professional IT experience with 8+ years of Big Data Ecosystem experience in ingestion, storage, querying, processing and analysis of big data.
- Strong knowledge on Distributed Computing Concepts and Parallel Processing Techniques using frameworks like Map reduce and Spark/Scala.
- Good experience in writing Spark applications using Python and Scala .
- Used Scala sbt to develop Scala coded spark projects and executed using spark - submit
- Strong hands on experience using Hadoop ecosystem components like Spark, MapReduce, HDFS, HBase, Zookeeper, Hive, Sqoop, Spark, Flume, Pig and Oozie.
- Experience includes Requirements Gathering/Analysis, Design, Development, Versioning, Integration, Documentation, Testing, Build and Deployment
- Strong experience working with SQL solutions like Hive and Impala for performing data analysis on large data sets.
- Familiar with (No-SQL) big-data databases like Hbase, MongoDB and Cassandra
- Experience working with MapReduce programs, Pig scripts and Hive to deliver the best results.
- Good Knowledge in creating event processing data pipelines using Kafka and Spark Streaming.
- Good Knowledge and experience with the Hive Query optimization and Performance tuning.
- Hands on experience in writing Pig Latin Scripts and custom implementations using Hive and Pig UDF'S.
- Experience in supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud. Performed Export and import of data into S3
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in using Flume to load log files into HDFS and Oozie for work flow design and scheduling.
- Worked on developing, monitoring and Jobs Scheduling using UNIX Shell Scripting
- Hands-on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
- Experience in tuning of Hadoop Cluster to achieve good performance in processing
- Well versed in installation, Configuration, Supporting and Managing of Big Data and underlying infrastructure of Hadoop Cluster
- Strong analytical skills with ability to quickly understand client’s business needs.
- Experience in deploying and managing the Hadoop cluster using Cloudera Manager.
- Experience in upgrading the existing Hadoop cluster to latest releases
- Experience in setting up automated monitoring and escalation infrastructure for Hadoop cluster using Ambari
- Experience in Mongo DB.
- Experience in Data Integration between RDBMS and Hadoop
- Experienced in using NFS (network file systems) for Name node metadata backup
- Experience in Web Services using XML, HTML, Ajax, JQuery and JSON
- Familiarity in working with popular frameworks like Hibernate, spring and MVC
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis
- Well trained in Problem Solving Techniques, Operating System Concepts, Programming Basics, Structured Programming and RDBMS
- Experiencing in monitoring team activities for offshore and onsite and involved in regular client meetings
- Strong functional and imperative programming skills, with a specialty in creating optimized code on any platform
- Technical professional with management skills, excellent business understanding and strong communication skills
TECHNICAL SKILLS:
Big Data and Hadoop Eco Systems: HDFS, Hadoop MapReduce, Hive, Pig, Sqoop, Spark, YARN, HBase, Zookeeper, Flume, Oozie, Cloudera Manager, Ambari, Hortonworks
Databases: Oracle 11g/10g/9i/8i, Microsoft SQL Server 2005/2008, MySQL, DB2, Flat Files.
Languages: Java, HiveQL, PigLatin, Advanced PL/SQL, SQL, C++, C, Shell, SQL*Plus 3.3/8.0, Scala
NoSQL Databases: HBase, Cassandra, MongoDB.
Web Tools/ Frameworks: HTML, Java Script, Python, XML, ODBC, JDBC, Java Beans, EJB, MVC, Ajax, JSON, JSP, Servlets, Struts, REST API, Spring, Hibernate.
Cloud Computing: Amazon Web Services (EC2, EMR, S3, RDS), SOA
Web/Application Servers: Apache Tomcat, Glassfish 4.0, Web Logic
Build and Build Management Tools: Maven, Ant, Jenkins
Databases Tools: TOAD, SQL Developer, SQL Workbench
Development Tools: Eclipse/Net Beans
PROFESSIONAL EXPERIENCE:
Confidential, Seattle, WA
Sr. Hadoop Developer/Data Engineer
Responsibilities:
- Developed custom data Ingestion adapters to extract the log data and click stream data from external systems and load into HDFS by making use of Spark/Scala.
- Developed spark program using scala API’s to compare the performance of spark with hql.
- Implemented spark using scala and sparkSql.
- Creating Hive tables, loading data and writing hive queries for building Analytical Datasets.
- Developed a working prototype for real time data ingestion and processing using Kafka, Spark Streaming, and HBase.
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Work with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Developed Kafka producer and Spark Streaming consumer to read the stream of events as per business rules.
- Designed and developed Job flows using TWS.
- Developed Sqoop commands to pull the data from Teradata.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Experience in using Pentaho Data Integration (PDI) tool for data integration, OLAP analysis and ETL process.
- Used AVRO, Parquet File formats and Snappy compression through the project.
- The data is collected from distributed sources into Avro models. Applied transformations and standardizations and loaded into HBase for further data processing.
Environment: Cloudera CDH5.x, Pentaho, HDFS, Hadoop 2.2.0(yarn), Eclipse, Spark, Scala, Hive, PIG Latin, Sqoop, Zookeeper, Apache Kafka, Apache Storm, MySQL.
Confidential, Portland, OR
Sr. Hadoop Developer/Big Data Engineer
Responsibilities:
- Involved in Installation and configuration of JDK, Hadoop, Pig, Sqoop, Hive, HBase on Linux environment. Assisted with performance tuning and monitoring.
- Worked on creating Map Reduce programs to parse the data for claim report generation and running the Jars in Hadoop. Co-ordinate with Java team in creating Map Reduce programs.
- Worked on creating Pig scripts for most modules to give comparison effort estimation on code development.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive
- Collaborated with BI teams to ensure data quality and availability with live visualization
- Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Involved in creating Hive tables, and loading and analyzing data using Hive queries.
- Developed Simple to Complex Map Reduce Jobs using Hive and Pig.
- Involved in running Hadoop Jobs for processing millions of records of text data.
- Developed the application by using the Struts framework.
- Created connection through JDBC and used JDBC statements to call stored procedures.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed the Pig UDF’S in Java to pre-process the data for analysis.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Moved all RDBMS data into flat files generated from various channels to HDFS for further processing.
- Developed job workflows in Oozie to automate the tasks of loading the data into HDFS.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
- Writing the script files for processing data and loading to HDFS.
Environment: Cloudera, Hadoop MapReduce, HDFS, Pig, Hive, Java, Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Windows NT, Sqoop.
Confidential, Houston, TX
Sr. Hadoop Developer
Responsibilities:
- Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Involved in complete Implementation lifecycle, specialized in writing custom Map Reduce, Pig and Hive programs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used Hive/HQL or Hive queries to query in Hive tables in HDFS.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Experience in developing customized UDF's in Java to extend Hive and Pig Latin functionality.
- Created HBase tables to store various data formats of data coming from different portfolios.
- Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Implemented test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Experienced in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
- Used File System check (FSCK) to check the health of files in HDFS.
- Involved in creating dashboards by extracting data from different sources.
- Created dashboard using parameters, sets, groups and calculations.
- Involved in creating interactive dashboard and applied actions (filter, highlight and URL) to dashboard.
- Involved in creating calculated fields, mapping and hierarchies.
- Created drill through reports in dashboard.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Experienced on loading and transforming of large sets of structured, semi and unstructured data.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Designed high performance mappings by using iterative variable logic in Expression Transformation.
- Developed and documented Data Mappings/Transformations, and Informatica sessions as per the business requirement.
- Tuned the Informatica ETL processes for optimum performance by integrating ETL processes based on common sources and optimizing the source SQL to use the correct indexes.
- Designed and developed the Informatica Workflows for Daily, Weekly, Monthly and Initial Loads.
- Provided data to the reporting team for their daily, weekly and monthly reports.
Environment: Hadoop, Java, UNIX, HDFS, Pig, Hive, Map Reduce, Sqoop, NoSQL DB’s, Cassandra, Hbase, LINUX, Flume, Oozie, Informatica Power Center 8.6, Flat files, Oracle 10g/9i RAC, Unix scripting.
Confidential, St. Louis, MO
Java Developer
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Designed tables and indexes.
- Wrote complex SQL and stored procedures.
- Involved in fixing bugs and unit testing with test cases using Junit.
- Study and Analyze 1404 UNIX scripts schedule in crontab to determine the migration plan from the 46 databases schemas, candidate applications shall be analyzed.
- Study and Analyze integration of 1404 jobs in Control M in info dev environment.
- Study and Analyze integration of SFTP implementation for these 1404 jobs.
- Define an approach for identifying the migration path for existing Unix jobs schedule in crontab to Control-M scheduler
- Perform script migration of all jobs that are part of crontab and migration of selected scripts from FTP to SFTP i.e. secure file transfer protocol.
- A tool and template based approach will be used to perform Migration
Environment: Java 1.6, UNIX Shell Scripting, MS SQL Server, Eclipse, Putty and Win-SCP.
Confidential, Charlotte, NC
Jr. Java Developer
Responsibilities:
- Responsible for developing various modules, front-end and back-end components using several design patterns based on client’s business requirements.
- Designed and Developed application modules using Spring and Hibernate frameworks.
- Designed and developed the front-end with Swings and Spring MVC framework, Tag libraries and Custom Tag Libraries and development of Presentation Tier using JSP pages integrating AJAX, Custom Tag’s, JSP TagLists, HTML, JavaScript and JQuery.
- Used Hibernate to develop persistent classes following ORM principles.
- Deployed spring configuration files such as application context, application resources and application files.
- Used Java-J2EE patterns like Model View Controller (MVC), Business Delegate, Session façade, Service Locator, Data Transfer Objects, Data Access Objects, Singleton and factory patterns.
- Used JUnit for Testing Java Classes.
- Used Waterfall methodology.
- Worked with Maven for build scripts and Setup the Log4J Logging framework.
- Involved in the Integration of the Application with other services.
- Involved in Units integration, bug fixing, and testing with test cases.
- Fixed the bugs reported in User Testing and deployed the changes to the server.
- Managing the version control for the deliverables by streamlining and re-basing the development streams of the SVN.
Environment: Java/JDK, J2EE, spring 2.5, Spring MVC, Hibernate, Eclipse, Tomcat, XML, JSTL, JavaScript, Maven2, Web Services, JQuery, SVN, JUnit, Log4J, Windows, Oracle.