We provide IT Staff Augmentation Services!

Hadoop Developer Resume

3.00/5 (Submit Your Rating)

Pawtucket, Ri

SUMMARY:

  • Experience around 5+ years in IT industry with complete software development of life cycle (SDLC) which includes business requirements gathering, system analysis & design, data modeling, development, testing and implementation of the projects.
  • Experience in configuration, deployments and managing of different Hadoop distributions like Cloudera (CDH4 & CDH5) and Hortonworks (HDP).
  • Experience of import/export data using Sqoop from Hadoop distributed file systems to relational database systems and vice versa. Good knowledge in understanding the Map Reduce programs.
  • Experience in Big Data Hadoop Ecosystems experience in ingestion, storage, querying, processing and analysis of big data.
  • Experience in optimization techniques in sorting and phase of Map reduce programs and implemented optimized joins that will join data from different data sources.
  • Experience in defining job flows managing and reviewing Hadoop log files.
  • Created and maintained Tables, views, procedures, functions, packages, DB triggers, and Indexes.
  • Used Sqoop to import data from RDBMS into hive tables. Developed map reduce jobs using java to preprocess data.
  • Involved in HDFS maintenance and loading of structured and unstructured data.
  • Created hive internal/external tables and worked on them using HIVE QL. Responsible for managing data coming from different data sources.
  • Load and transform large sets of structured, semi structured and unstructured data and Responsible to manage data coming from different sources.
  • Experience in handling various file formats like AVRO, Sequential, text, xml, JSON and Parquet with different compression techniques such as gzip, LZO, Snappy etc.
  • Imported the data from source HDFS into Spark Data Frame for in - memory data computation to generate the optimized output response and better visualizations.
  • Experience on collection the real time streaming data and creating the pipeline for raw data from different source using Kafka and store data into HDFS and NoSQL using Spark.
  • Implemented POC for using Impala for data processing on top of HIVE for better utilization.
  • Knowledge in NoSQL Databases HBase, Cassandra and it's integrated with Hadoop cluster.
  • Experienced with Oozie to automate the data movement between different Hadoop systems.
  • Good understanding on security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure. Mentored analyst and test team for writing Hive Queries.
  • Experience in writing Hive Queries for processing and analyzing large volumes of data.
  • Interacted effectively with different team members of the Business Engineering, Quality Assurance and other teams involved with the System Development Life cycle.

TECHNICAL SKILLS:

Big data Eco system Components: HDFS, Hadoop MapReduce, Zookeeper, Hive, Sqoop, Spark, Kafka, Oozie, HiveQL.

GUI Tools: Hue, GitHub, GitLab, Splunk.

TOAD, Toad: Data Point, PL/SQL Developer, SQL Developer, and SQL* PLUS.

Databases: Oracle (SQL,), Teradata, SQL Server.

Web Technologies: HTML, CSS, JavaScript

Operating Systems: Linux 5, UNIX, Windows XP, 7, 8, and 10.

PROFESSIONAL EXPERIENCE:

Confidential, Pawtucket, RI.

Hadoop Developer

Responsibilities:

  • Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data using several tools.
  • Imported the data from various formats like Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
  • Experience on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop
  • Configured Hive and participated in writing Hive UDF's and UDAF's. Also, created partitions such as Static and Dynamic with bucketing.
  • Importing and exporting data into HDFS and hive using Sqoop and Kafka with batch and streaming.
  • Using Hive join queries to join multiple tables of a source system and load them into Data Lake.
  • Experience in managing and reviewing huge Hadoop log files.
  • Involved in HDFS maintenance and loading of structured and unstructured data. Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts.
  • Involved in migration of the data from Oracle to Hadoop data lake using Sqoop import. Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Created Apache Oozie workflows and coordinators to schedule and monitor various jobs including Sqoop, hive and shell script actions.
  • Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
  • Maintaining technical documentation for each step of development environment including HLD and LLD.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Extensively used ESP workstation to schedule the Oozie jobs.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Built the automated build and deployment framework using GitHub and Maven etc.
  • Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.
  • Creating reports using tableau for business data visualization.

Environment: Hadoop, HDFS, Hive, Oozie, Sqoop, Oozie, ESP Workstation, Shell Scripting, HBase, GitHub, Tableau, Oracle, MySQLClient: JP Morgan Chase

Confidential, Columbus, Ohio

Hadoop Developer

Responsibilities:

  • Worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
  • Involved in complete Implementation lifecycle development.
  • Extensively used Hive/HQL or Hive queries to query or search for a string in Hive tables in HDFS.
  • Experience in creating Hive Managed Tables and External tables and loading the transformed data to those tables. Experience in using AVRO, JSON, XML file formats.
  • Possess good Linux and Hadoop System Administration skills, networking, shell scripting and familiarity with open source configuration management and deployment tools such as chef.
  • Managing and scheduling Jobs to remove the duplicate log data files in HDFS using Oozie.
  • Utilized Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Used Apache Oozie for scheduling and managing the Hadoop Jobs, knowledgeable on HCatalog.
  • Created and designed data ingest pipelines using technologies such as Spring integration, Apache Storm-Kafka.
  • Implemented test scripts to support test driven development and continuous integration.
  • Dumped the data from HDFS to Oracle database and vice-versa using Sqoop.
  • Documenting the procedures performed for the project development.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL)
  • Experience in Analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirements.
  • Involved in moving all log files generated from various sources to HDFS for further processing
  • Extracted the data from Teradata into HDFS using the Sqoop. Supported Data Analysts in running Map Reduce Programs
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Developed the UNIX shell scripts for creating the reports from Hive data.
  • Documenting and transferring knowledge regarding the various objects and the changes done by me to production support team.
  • Extensively used Sqoop to get data from RDBMS sources like Teradata and Oracle.
  • Involved in collecting metrics for all the ingested data on weekly basis and providing report for the business.

Environment: Linux (RedHat), UNIX Shell, Oracle, Hive, MapReduce, Core Java, JDK1.7, Oozie Workflows, Cloudera, HBASE, SQOOP, Cloudera Manager.

Confidential

SQL Developer

Responsibilities:

  • Interacted with the users for understanding and gathering business requirements.
  • Designed a complex SSIS package for data transfer from three different firm sources to a single destination like SQL server 2005.
  • Developed and optimized database design for new applications. Data residing in the source tables have been migrated into staging and then final tables.
  • Implemented data views and control tools for guarantee data transformation using SSIS. Successfully deployed SSIS packages with defined security.
  • Developed logical database and converted into physical database using Erwin.
  • Involved to write complex T- SQL queries and Stored Procedures for generating reports. Successfully worked with Report Server and configured into SQL Server 2005.
  • Responsible to monitor performance and optimize SQL queries for maximum efficiency.
  • Proficiently scheduled the Subscription Reports with the Subscription Report wizard.
  • Involved in the analysis, design, development, testing, deployment and user training of analytical and transactional reporting system.
  • Used stored procedures, wrote new stored procedures and triggers, modified existing ones, and tuned them such that they perform well.
  • Tuned SQL queries using execution plans for better performance.
  • Optimized by assigning relative weights to the tables in the Catalog and Query Performance. Analyzed reports and fixed bugs in stored procedures.

Environment: MS SQL Server 2005/2008, SSDT, T- SQL, SQL Profiler, Execution Plan, Win Merge, Notepad ++

Confidential

Associate Client Analyst

Responsibilities:

  • Perform financial analyses and rent roll reviews for assigned portfolios in accordance with CMSA guidelines, Agency requirements and internal policies and procedures
  • Research and comment on period to period variances, contact borrowers for additional information and interact with other areas of servicing to ensure complete and accurate analyses are reported
  • Ensure trigger events and other loan covenants are addressed upon completion of financial analysis
  • Perform quality control reviews of financial analyses and trigger analyses
  • Work in conjunction with the Client Relations group to represent the Company to investors, trustees, rating agencies and borrowers, etc. with respect to property financial statement matters
  • Ensure all systems are updated with the results of the financial statement analysis; these systems include, but are not limited to Asset Surveillance, Investor Query, CAG Workbench and Freddie Mac PRS system
  • Handle client requests relating to assigned portfolio(s) in an accurate and expedient manner
  • Monitor compliance for Financial Statement collection, analysis, and distribution and follow up with external parties
  • Manage third party vendor & client relationships
  • Domestic and international travel may be required

Environment: Advanced experience in Microsoft Office including Outlook, Word, PowerPoint, and Excel.

We'd love your feedback!