Sr. Big Data Engineer Resume
Herndon, VirginiA
SUMMARY:
- 8+ years of professional experience in IT industry, involving 4 years of experience with Big Data tools in developing applications using Apache Hadoop/Spark echo systems.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Experienced in working with Spark ecosystem using Spark - SQL and Scala queries on different data file formats like .txt, .csv etc.
- Designing and creating Hive external tables using shared meta-store instead of the derby with partitioning, dynamic partitioning and buckets
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience in integrating Hive and HBase for effective operations.
- Developed Scala UDF’S to process the data for analysis.
- Experience working on different file formats like Avro, Parquet, ORC, Sequence and Compression techniques like Gzip, Lzo, and Snappy in Hadoop .
- Strong understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA.
- Have good experience in creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
- Working knowledge on major Hadoop ecosystems HIVE , Sqoop,Pig and Flume.
- Good experience in Cloudera, Hortonworks & Apache Hadoop distributions.
- Developed high-throughput streaming apps reading from Kafka queues and writing enriched data back to outbound Kafka queues.
- Experience in tuning and troubleshooting performance issues in Hadoop cluster.
- Wrote and worked on complex performance improvements on PL/SQL queries, stored procedures , triggers , indexes with databases like MySQL and Oracle.
- Also, working towards improvement of knowledge on No-SQL databases like MongoDB.
- Hands-on experience in scripting skills in Python , Linux and UNIX Shell.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Experience in developing web-based applications using Python.
- Experience in application development using Java, J2EE, EJB, Hibernate, JDBC, Struts, JSP and Servlets.
- Working with relative ease with different working strategies like Agile, Waterfall and Scrum methodologies.
TECHNICAL SKILLS:
Programming Languages: Java, Python, Scala
Scripting Languages: Shell script, JavaScript, HTML, CSS, XML
Development tools: IntelliJ, Eclipse, Visual Studio, MonoDevelop
Database: MySQL, Oracle, SQL Server, HBase, Cassandra, MongoDB
Operating System: Mac OS, Windows 10, Linux
Version Control Tools: SubVersion, GitHub
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Herndon, Virginia
Sr. Big Data Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop, Spark with Scala.
- Implemented Spark using Scala, utilizing Data frames and Spark SQL API for faster processing of Batch and real time streaming data.
- Developed scripts to perform business transformations on the data using Hive and Impala for downstream applications.
- Handled large datasets using Partitions, Broadcasts in Spark, Effective & efficient Joins, Transformations and others during ingestion process itself.
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames, Spark RDDs and Scala.
- Created a common data lake for migrated data to be used by other members of the team.
- Implemented pre-defined operators in spark such as map, flat Map, filter, ReduceByKey, GroupByKey, AggregateByKey and CombineByKey etc.
- Worked with different file formats (Sequential, AVRO, RC, Parquet and ORC) and different Compression Codecs (gzip, snappy, lzo).
- Developed complex ETL transformation & performance tuning.
- Import and export of data using Sqoop from or to HDFS and Relational DB Oracle and Netezza.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Working extensively on Hive, SQL, Scala, Spark, and Shell.
- Developed a data pipeline using Kafka to store data into HDFS.
- Experienced in writing Spark RDD transformations, actions for the input data and Spark-SQL queries, Data frames to import data from Data sources to perform data transformations, read/write operations using Spark-Core and save the results to output directory into HDFS.
- Responsible for design and development of Spark Applications using Scala to interact with hive and MySQL databases.
- Experience with Oozie workflow to automate and schedule daily jobs.
- Experience with job control tools like Autosys.
- Scheduling and managing cron jobs, wrote shell scripts to generate alerts.
- Hands on experience in installing, configuring and using eco-system components like Hadoop, MapReduce, HDFS.
Environment: Linux,Eclipse, jdk1.8.0,Hadoop 2.9.0,Kafka,HDFS,Map Reduce, Hive 2.3,Kafka 2.11.2,CDH 5.4.0,Oozie-4.3.0, Sqoop 1.4.7, Tableau, ShellScripting, RabbitMQ,Scala 2.12, Spark 2, Python 3.6/3.5/3.4,Maven Repository
Confidential, Warwick, Rhode Island
Hadoop Developer / Big Data Developer
Responsibilities:
- Developed Spark applications using Scala utilizing Data frames and spark SQL API for faster processing of data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Developed Spark jobs using Scala on top of Yarn for interactive and Batch Analysis.
- Handled importing of data from various data sources, performed transformations using spark, loaded data into Hive.
- Used storage format like AVRO to access multiple columnar data quickly in complex queries.
- Used Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Imported data from our relational data stores to Hadoop using Sqoop.
- Created various Map Reduce jobs for performing ETL transformations on the transactional and application specific data sources.
- Used Control-m scheduling tool to schedule daily jobs.
- Wrote PIG scripts and executed by using Grunt shell.
- Worked on the conversion of existing Map Reduce batch applications for better performance.
- Developed multiple Map Reduce jobs to perform data cleaning and pre-processing.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Generating the required reports using Oozie workflow and Hive queries for operations team from the ingested data.
Environment: Hadoop, gphd-2.2, HDP2.6.2, Hive 1.2.1,Java 1.8,Spark 2.2,ApacheAmbari2.5.1, HDFS,Hive,Sqoop,MapReduce,Oozie, MySQL,Pig 0.16.0
Confidential, Wilmington, Ohio
Hadoop Developer/Big Data Developer
Responsibilities:
- Worked on improving the performance of existing Hive and Pig queries.
- Developed Oozie workflow engines to automate Hive, Sqoop etc. Jobs.
- Created RDD’s and Pair RDD’s for Spark programming.
- Analyzed the partitioned and bucketed data and compute various metrics for reporting.
- Created HBase tables to store various data formats of data coming from different systems.
- Created RDDs and Pair RDDs for spark programming.
- Implement Joins,Aggregations, groupingetc. transformation for pair RDDs in Scala.
- Responsible to manage data incoming from different sources for applications.
- Developed solutions to pre-process large sets of structured,semi- structured data with different file formats (Text file, Avro data files, Sequence files, Xml and Json files, ORC and Parquet).
- Used apachemaven as a build tool to manage projects build.
- Experience with GIT for version control.
- Extracted files from RDBMS through Sqoop, placed in HDFS and processed for downstream applications like Tableau etc.
Environment: Hadoop, gphd-2.2, MapReduce, HDFS, Hive, Java 1.7, Spark 2.1, MySQL, Linux, Eclipse, IntelliJ, Sqoop
Confidential, Deerfield, IL
Big Data / Hadoop Engineer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in collecting, aggregating and moving log data from servers to HDFS using Flume.
- Imported and Exported Data from Different Relational Data Sources like DB2, SQL Server, Teradata to HDFS using Sqoop.
- Ingesting the data from legacy and upstream systems to HDFS using apache Sqoop, Flume, Hive queries.
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally.
- Designed and implemented Incremental Imports into Hive tables.
- Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
- Used Oozie Operational Services for batch processing and scheduling workflows dynamically.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Visualization tools such as Power View for excel, Tableau for visualizing and generating reports.
- Exported data to Tableau and excel with Power View for presentation and refining.
Environment: Hadoop, gphd-1.2, MapReduce, HDFS, Hive, Java1.6 & 1.7, Pig, Linux, Eclipse, RabbitMQ, Zookeeper, PostgreSQL, Control-M, Redis, Tableau, QlikView, DataStax
Confidential, Tallahassee, FL
Senior Java Developer
Responsibilities:
- Involved in Joint Application Design (JAD) sessions to analyze software specifications and identification of application functionalities for design, development and testing.
- Developed the Administrator module of the application to maintain the system code tables, Individuals and managing the roles etc.,
- Developed the location search module for the users to maintain the site related details like programs, phases and activities etc.,
- Developed the financial management module for the user to maintain the contracts, task assignments and deliverables etc.,
- Responsible for producing reports on the screen, in Excel format and PDF format using Apache Poi and iText PDF
- Extensively used Hibernate criteria to implement the search functionality throughout the system
- Implemented the Hibernate cache for optimizing the search performance
- Implemented the role-based security at action level throughout the system
- Used map-direct for marking the location on map
- Implemented Pagination throughout the system using the display tag
- Developed J2EE based screens for user to manage contract, site and other functionalities. Used JSP, HTML, CSS and AJAX for the enriched front end.
- Used SPRING framework along with Struts to give IOC feature to the application.
- Extensively used Java Collection framework and Exception handling.
- Responsible for integrating with an existing application CMSS to manage a contract after execution, ERIC manages the contract before execution
- Writing Data Access Objects (DAO) for fetching and storing the client data from the database.
- Worked with Struts 2 validations to validate the user input data
- Updated design changes in such a way that the current implementation is not impacted for the Change Requests.
- Designed the wire frames to present the customer in JAD sessions, updated the design document
- Involved in functional and regression testing of the application.
- Provided support to the application during different releases and updating the documentation through all the releases.
Environment: :Java 1.7, Struts 2, Hibernate 4.0, Spring 3.0, JSP, WebLogic, Maven, Log4j 1.4, Oracle 10g, iTextPdf 5.4, jQuery, AJAX, IntelliJ, Continuum, Subversion, Apache POI, iText Pdf, JIRA, JSON
Confidential, Tallahassee, FL
Senior Java Developer
Responsibilities:
- Analyzing software specifications and identification of application functionalities for design, development and testing.
- Working with Framework team to design the base line architect for the project and write code and come up with project coding standard.
- Coding and implementing the proposed design, unit testing and Integration testing.
- Used SPRING framework along with Struts to give IOC feature to the application.
- Developed various user interfaces using JSP and Struts TLD’s and used CSS for page styling.
- Extensively used Java Collection framework and Exception handling.
- Responsible for creating the SNAP Operator Login for providers to create and access the benefit information.
- Responsible for generating a PDF summary for customer to review the details entered on the application
- Responsible for Customer Identity verification and Authentication by consuming a LexisNexis Web Service called Identity Verification and Criminal Verification
- Developed UNIX shell scripts for checking the health status of JVMs and also creating logs on timely basis
- Responsible for migrating the Check My Benefits module from JSF to Struts1.3.
- Responsible for integrating the existing application for AMS user to view benefit details.
- Developed various user interfaces using JSP and Struts TLD’s and used CSS for page styling.
- Writing Data Access Objects (DAO) for fetching and storing the client data from the database.
- Extensively used Java Collection framework and hibernate Validator framework to validate user inputs.
Environment: Java 1.6, Struts 1.3, Hibernate 3.0, Spring 3.0, JSP, IBM WebSphere 8.0, Web Services, Log4j 1.4, Ant, Oracle 10g, iTextPdf 5.4
Confidential
Java Developer
Responsibilities:
- Involved in maintaining and supporting the G5 system.
- Worked on different task orders as a part of enchantment for the existing system.
- Worked primarily on e-Grants part of G5.
- Provide management with issue management and prioritization.
- Participated in functional specifications and user requirement gathering.
- Developed visualizations to represent large data using JSON and D3, used JSON for showing various statistics in graphical representation
- Configured and used FindBugs tool to find code bugs and fixed the issues reported by the tool
- Involved in upgrading the WebSphere portal server from 6 to portal server 7
- Facilitate release related activities including design and code review.
- Developed the review model using IBM JSF Portlets and Hibernate.
Environment: Java/J2EE, WebSphere Portal Server 7, RAD8, JSF 1.2, Hibernate 3, Oracle 11g, JSON, Rational Clear Case, Rational Clear Quest, FindBugs2.0