We provide IT Staff Augmentation Services!

Senior Big Data/hadoop Engineer Resume

4.00/5 (Submit Your Rating)

Hartford, CT

SUMMARY

  • 8+ years of work experience in IT Industry in Analysis, Design, Development and Maintenance of various software applications mainly in Hadoop - Cloudera, Horton Works, Oracle (SQL, PL/SQL), Data Base Administrator in UNIX and Windows environments in industries like Banking, Financial, Pharmacies, Fixed Income, and Health Insurance.
  • 5 + years of experience in Hadoop 1.x/2.x, CDH3U6, HDFS, HBase, Spark, Sqoop 2.x, Scala, hive 0.7.1, Pig, Map reduce, Kafka, Flume, Java 1.6, Linux, Eclipse Juno, Impala, Oozie, XML, JSON, Maven, Zookeeper, Impala, and Hue.
  • Experience working in Spark ecosystem using Spark Context, Spark-SQL, Data Frames and Pair RDD's to read/write different file formats.
  • Worked in large scale database environment like Hadoop MapReduce, with working mechanism of Hadoop clusters, nodes and Hadoop Distributed File System (HDFS).
  • Working experience with Ingestion, Storage, Querying, Processing and Analysis of Big data using Hadoop environment and Apache Spark.
  • Experience working with Kafka on operational metrics, stream processing and logs aggregation.
  • Master in composing Hive QL queries and Pig Latin scripts for development and reporting.
  • Experience in writing UDFs and UDAFs to extend functionality in Hive and PIG.
  • Solid experience in performing different Optimization techniques like distributed cache, map side joins, partitioning and bucketing.
  • Understanding of stacking a streaming data straightforwardly into HDFS utilizing Flume and Spark Streaming.
  • Worked on user interface modules using HTML, JavaScript and Python.
  • Understanding of stacking a streaming data straightforwardly into HDFS utilizing Flume and Spark Streaming.
  • Experience working in tuple processing, writing data with Storm by provide Storm-Kafka connectors.
  • Hands on experience in utilizing Map Reduce programming model using Java for Batch processing of data stored away in HDFS.
  • 2 + years of Data Modeling & Data Analysis experience using Dimensional Data Modeling and Relational Data Modeling.
  • 3 + years of database administration experience using Oracle DBA 10g and11g databases like setup schemas, backup and recovery, setup database environments, Installation/configuration and Physical & Logical Data Modeling.
  • Solid experience and understanding in VMs, UNIX, and Shell Scripting.
  • Excellent noledge in MySQL and NoSQL Databases like HBase, Mongo DB and its Integration with Hadoop.
  • ETL experience using Informatica Power center 9.x/10.x (Repository Admin, Rep Manager, Designer, Workflow Manager & Workflow monitor).
  • Experience in working with various data formats like sequence file, Avro files, RC and ORC files and parquet files.
  • Experience working in Cloudera and Hortonworks environments.
  • Proficiency in writing complex SQL queries and have Prevalent experience in Oracle RDBMS 10g, 11g design and developing Stored Procedures/packages, Triggers, Cursors using SQL and PL/SQL.

TECHNICAL SKILLS

Hadoop Ecosystem: HDFS, Apache Spark, Scala, Kafka, Storm, Storm Streaming, Hive, Sqoop, Flume, Oozie, Zookeeper, Apache Hadoop; Hadoop Distributed File System; Replication; Cloudera Cluster; Apache Pig; Map Reduce; Cassandra, no SQL, MongoDB, SaaS, Power Center, Relational, hierarchical and graph databases, data federation and query optimization

Big data Platform: Cloudera Hadoop CDH 5.X, MapReduce (MRV1, MRV2/YARN), Hortonworks Sandbox

Programming Languages: Java, C, PL SQL, Pig Latin, UNIX Shell Scripting, Python.

Internet Technologies: PHP, JSP, XML, HTML, CSS, JavaScript

Database Technologies: PL/SQL, MySQL, Oracle Forms and Reports 9i

Operating Systems: Windows 2012, UNIX

Banking Systems: FLEXCUBE, Interfacing with GIW, TRANSACTOR

PROFESSIONAL EXPERIENCE

Confidential, Hartford, CT

Senior Big Data/Hadoop Engineer

Responsibilities:

  • Developed Scala scripts, UDF’s using bothDataframes/SQL and RDD/MapReduce in Spark forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
  • Created hive schemas using performance techniques like partitioning and bucketing.
  • Developed analytical components using Spark Scala and Spark Stream.
  • Importeddatafrom different sources like HDFS/Hbase to Spark RDD.
  • Developed Spark scripts by using Scala Shell commands as per teh requirement.
  • DevelopedHive UDFsandPig UDFsusing Python in Microsoft HDInsight environment.
  • Wrapper developed inPythonfor instantiating multi-threaded application and running with other applications
  • Development of test framework using thePython
  • Automation for execution of tests in batch usingShell scripting
  • Development ofPythonAPIs to dump teh array structures in teh Processor at teh failure point for debugging.
  • Involved in creating POCs to ingest and process streamingdatausing Spark and HDFS.
  • Worked in analyzingdatausing HiveQL and custom MapReduce programs.
  • Worked on back-end using Scala and Spark to perform several aggregation logics.
  • Explored different implementations in Hadoop environment for data extraction and summarization by using packages likeHive.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Developed MapReduce programs to parse teh raw data, populate staging tables and store teh refined data in partitioned tables in teh enterprisedata warehouse(EDW).
  • Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Involved in Agile Scrum methodology dat leverages teh Clientbigdataplatform and used version control tool Git.
  • Supported code/design analysis, strategy development and project planning.
  • Created reports for teh BI team using Sqoop to export data into HDFS and Hive.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.

Environment: Hadoop 2.x MR1, Apache Spark 1.6.x, HDFS, HBase 2.x, Scala 2.10.5, Hive 0.7.1, Eclipse, Sqoop 2.x, Eclipse, Python, Zeke, MySQL, Java (JDK 1.7).

Confidential, Irving, TX

Sr. Big Data/Hadoop Engineer

Responsibilities:

  • Created Hive tables and worked on them utilizing Hive QL.
  • Analyzed teh data by performing Hive queries and running Pig scripts to no client conduct.
  • Developed and Supported MapReduce Programs those are running on a 50 nodes cluster.
  • Developed Spark scripts by using Scala shell commands as per teh requirement to read/write JSON files.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's to read/write JSON files.
  • Creating reports for teh BI team using Sqoop to export data into HDFS and Hive.
  • Handled real-time data transaction logs using Apache Kafka.
  • Creating HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Implemented ETL process using hive and pig which includes Python and Java UDF's for cleansing teh data.
  • Hands on experience with real time processing - Spark (using Python) andStorm.
  • Developing MapReduce programs in java for data extraction, transformation and aggregation for multiple file formats including XML, JSON and other file formats.
  • Gatheird teh business necessities from teh Business Partners and Subject Matter Experts.
  • Worked on Scala in writing teh framework applications.
  • Validated Name hub, Data hub status in a HDFS cluster.
  • Weekly gatherings with specialized partners and dynamic interest in code audit sessions with senior and junior designers.
  • Used to oversee and survey teh Hadoop log documents.
  • Responsible to oversee data originating from various sources.
  • Supporting HBase Architecture Design with teh Hadoop Architect group to build up a Database Design in HDFS.
  • Exploring with Spark for improving teh performance and optimization of existing algorithms in Hadoop using Spark Context, Spark SQL, Data frames and RDD.
  • Involved in HDFS upkeep and stacking of organized and unstructured data.
  • Wrote MapReduce jobs utilizing Java API.
  • Wrote Hive queries for data investigation to meet teh business prerequisites.
  • Developed UDFs for Pig Data Analysis.
  • Involved in overseeing and checking on Hadoop log records.
  • Developed Scripts and Batch Job to plan different Hadoop Programs.
  • Handled bringing in of data from different information sources, performed transformations utilizing Hive, MapReduce.
  • Continuous checking and dealing with teh Hadoop group through Cloudera Manager.
  • Installed Oozie work process motor to run numerous Hive and Pig jobs.
  • Developed Hive queries to handle teh data and analyzed teh data solid cubes for picturing.

Environment: Hadoop 2.x MR1, Apache Spark 1.6.x, HDFS, HBase 2.x, Kafka, Hive 0.7.1, Pig, Eclipse, Sqoop 2.x, Eclipse, Python, Flume 0.9.3, Oracle 11i, MySQL, Oozie, Java (JDK 1.7).

Confidential, Chandler, AZ

Sr. Big Data/Hadoop Engineer

Responsibilities:

  • Developed Hive QL scripts to perform Sentiment Analysis (analyzed customer's comments and product ratings).
  • Involved in Design and Development of specialized detail records utilizing Hadoop
  • Developed MapReduce projects to parse teh raw information, populate tables and store teh refined data in partitioned tables. Overseen and considered Hadoop log records.
  • Developed and composed Apache PIG scripts to handle teh HDFS information.
  • Migrated teh required data from Oracle item INFORM to HDFS utilizing Sqoop and imported different organizations of level documents into HDFS.
  • Defined work streams according to their conditions in Oozie.
  • Worked with Kafka for real time data analytics and on Queuing and messaging systems.
  • Maintained System uprightness of all sub-segments identified with Hadoop.
  • Worked on Apache Hadoop groups.
  • Involved in onsite support from Poland in Client location, bank
  • Support of intraday issues and end of day batch support to Warsaw, Western Europe, London and CEEMEA countries.
  • Configuring and installing Hadoop ecosystems (Hive/Pig/ HBase/ Sqoop/ Flume)
  • Designing and implemented a distributed data storage system based on HBase and HDFS.

Environment: Hadoop 1.x/2.x, Hive, Apache Spark 1.6.x, Pig, Map Reduce, Oozie, HBase 0.90.x, Sqoop, Flume 0.9.3, Hue, Eclipse, Apache Tomcat 4.1.30, Linux, JAVA 1.6/1.5, JSP, HTML, CSS, JavaScript.

Confidential

Sr. Big data/Hadoop Developer

Responsibilities:

  • Analyze client systems and gather requirements.
  • Design technical solutions based out if teh client needs and system architecture.
  • Data is collected from Teradata and pushing into Hadoop using Sqoop.
  • Data was pre-processed using Map Reduce and stored into teh Hive tables.
  • Data was cleansed using Hive and stored into HDFS for analysis.
  • Implementation part of big data solution has been taken care.
  • Extracted data from various log files and loading it into HDFS under Hadoop.
  • Extracted data from RDBMS (Oracle) into Hadoop and vice-versa using Sqoop.
  • Developed and build hive tables using teh HiveQL.
  • Developed scripts using HiveQL for teh analysis purpose.
  • Developed UDF’s to perform necessary actions on Pig and Hive queries.
  • Tuned Hive queries to get teh best performance.
  • Written Apache Pig scripts for data analysis on top of teh HDFS files.
  • Used Hive on teh RDBMS data for data analysis and stored backed to DB.
  • Developed Apache Pig scripts to extract teh data from teh web server output files to load into HDFS.
  • Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing with Pig scripts.
  • Worked with SPARK eco-system using SCALA and HIVE Queries on different data formats and developed POC’s with Spark-SQL.
  • Involved in installing and configuring required Hadoop Eco-Systems using EMR over clusters (AWS-EC2 with S3 Storage).
  • Worked on No-SQL database like Hbase, Cassandra.
  • Designed and developed database stored procedures, triggers and functions to ensure adherence standards and overall quality.
  • Used github for version control and managing teh codes.
  • Preformed aggregation over large amounts of log data collected using Apache Flume and staging data in HDFS for further analysis.
  • Managing and communicating with clusters using Zookeeper.
  • Exposure on Tableau for reporting and analysis purpose.
  • Provided solutions to various problems in Hadoop.

Environment: CDH 5.5, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, MySQL, Hbase, Scala, Flume, Zookeeper, Gzip, Snappy, Tableau, Linux.

Confidential

Hadoop Consultant

Responsibilities:

  • Load and transform large sets of structured, semi structured and unstructured data
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map
  • Worked in Automating all teh jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Performed important approvals of each screen created by composing Triggers, Procedures and Functions accessible alongside teh items, occasions and strategies.
  • Extensively utilized Joins, Triggers, Stored Procedures and Functions in Interaction with backend database utilizing SQL
  • Created Menus keeping in mind teh end goal to explore starting with one screen tan onto teh next screen.
  • Responding to Online queries related to FLEXCUBE
  • Providing solutions/explanations to teh problems and queries raised
  • Providing cross - version and multi-country support
  • Status reporting to teh Client and managing teh required solutions
  • Onsite assignment to deal with critical sites and their cluster queries.
  • Modified existing Complex Forms as indicated by client needs.
  • Responsible for Production Support Issues involving Bug Fixes, performing user acceptance test, integration testing.
  • Responsible for maintaining technical and functional documentation.
  • Connected to teh database to control information utilizing teh JDBC innovation.
  • Generated important reports relative to FLEXCUBE and responded to other products.
  • Supporting code/design analysis, strategy development and project planning.
  • Commissioning and Decommissioning teh Hadoop nodes & Data Re-balancing
  • Loading data into parquet files by applying transformation using Impala

Environment: Hadoop 2.x MR1, HDFS, HBase 2.x/0.90.x, AZURE, Amazon Redshift, Spark Flume 0.9.3, Impala, security - Kerberos, Sqoop 2.x, Hive 0.7.1 & Tableau 9.3 (Online, Desktop, Public Vizable), Java, Eclipse, PL/SQL, Linux, Oracle 10g, AJAX, MySQL

Confidential

ETL Consultant / Developer

Responsibilities:

  • Coordinated in supporting various Production Issues and Emergency issues of FLEXCUBE production of Bank Team.
  • Acquired both Banking Functional noledge and Technical noledge in PL/SQL, Forms 10G, UNIX. This includes Preparation of weekly reports and attending Client meetings to discuss teh pending items and testing instructions.
  • Responsible for gathering and analyzing requirements and converting them into technical specifications
  • Developed presentation layer using JSP, Java, HTML and JavaScript
  • Designed and developed Hibernate configuration and session-per-request design pattern for making database connectivity and accessing teh session for database transactions respectively. Used HQL and SQL for fetching and storing data in databases.
  • Participated in teh design and development of database schema and Entity-Relationship diagrams of teh backend Oracle database tables for teh application.
  • Developed database access objects for various modules.
  • Involved in Unit Testing, Integration Testing, and System Testing.
  • Designed and Developed Stored Procedures, Triggers in Oracle to cater teh needs for teh entire application. Developed complex SQL queries for extracting data from teh database

Environment: PL/SQL, Linux, Oracle 10g, JavaScript, CSS, UNIX Shell Scripting, AJAX, Eclipse IDE.

Confidential

Java Developer

Responsibilities:

  • Implemented proposed SOA architecture, and providing Interoperable business processes and integrating services from different applications mostly through web services using JAXB, JAX-RPC packages.
  • Requirement gathering and developing teh design related artifacts.
  • Worked in creation of unit test plans and integration test plans.
  • Developed and coded for Junit test cases for testing business.
  • Developed and provided JProfiler reports to client along with release docs with every release.
  • Managing Timely build and release.
  • Took domain session for team for better understanding of client business and requirements.
  • Performed key role in implementing Persistence framework using Hibernate and keeping it in sync with new changes in teh data model as an up gradation with every release.
  • Development of EJB Timer architecture using both synchronous and asynchronous queues and topic with JMS.
  • Development of modules in core java at business tear and developing teh JSP’s with struts2.0 presentation framework at front end.
  • Core java development of Login and logout and partners search modules.
  • Designed, Developed and Implemented teh Excel Generation Framework for teh entire application using Apache POI
  • Took sessions for risk management and coding standards for team.

Environment: Spring, Struts2.0, Java Script, HTML, RAD Java, AJAX, Web sphere Application Server, JSP, Servlets, EJB, Web services, DOA, Hibernate, Oracle, XML, SOAP, JUnit, ANT.

We'd love your feedback!