We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

NJ

PROFESSIONAL SUMMARY:

  • More than 7 years of experience with strong emphasis on developing, implementing, configuring, testingHadoopecosystem components and maintenance of various web - based applications.
  • In depth working knowledge in of Distributed Systems Architecture and Parallel Processing Frameworks
  • Strong knowledge on full Software Development life cycle -Software analysis, design, architecture, development and maintenance.
  • Experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Big Data Ecosystem including Apache Spark, Spark SQL.
  • Experience in HDFS, Map Reduce, Pig, Hive, YARN, HBase, Sqoop, Zookeeper and Oozie.
  • Experience in analyzing data using Hive, Pig Latin, HBase and custom Map Reduce program.
  • Worked on writing custom UDFs in java for Hive and Pig and SERDE's for Hive.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Experienced using Sqoop to import data into HDFS/Hive from RDBMS and exporting data back from HDFS/Hive to RDBMS.
  • Involved in batch job scheduling workflow using Oozie and Event Engine
  • Good Knowledge on Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
  • Good Knowledge on serialization formats like Sequence File, Avro, Parquet, ORC.
  • Experience in working with different data sources like Flat files, XML files and Databases.
  • Expertise in writing queries on Splunk tool to see the history data and trending.
  • Experience in working with different Hadoop distributions like MapR, Cloudera (CDH 5 and CDH 6) and Databricks
  • Experience in AWS cloud environment and on S3 storage and EC2 instances.
  • Strong experience in implementing CI/CD (continuous integration and continuous deployment) and test-driven development using Jenkins.
  • Expertise in using the Service Now tool for creating Incidents, Problems, Knowledge Articles, Change Requests.
  • Hand on experience on UNIX scripting.
  • Strong work ethic with desire to succeed and make significant contributions to the organization
  • Strong problem-solving skills, good communication, interpersonal skills and a good team player
  • Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
  • Presented multiple Show and Tell sessions to client on Hadoop use-cases.
  • Good understanding and experience with Software Development methodologies like Agile and Waterfall.
  • Expertise in using Version Control systems like SVN, Jenkins, GIT HUB.

TECHNICAL SKILLS:

Programming Languages: Python, Scala, Java, C, C++, SQL.

Big Data technologies: SparkCore,SparkSQL, Hadoop, MapReduce, Pig, Hive, YARN, Sqoop, Oozie, Event Engine, Flume, Zookeeper, Kafka, Elastic Search.

Scripting Languages: Shell Script, Python.

Java Technologies: JDBC, JSP, JSON, Servlets, JUnit, REST, Web Services

Databases: Oracle, MySQL, PostGRE SQL

NoSQL Databases: HBase, Cornerstone

Application Servers: WebLogic, WebSphere, Tomcat

IDEs: Eclipse, IntelliJ, Toad, SQL Developer

Operating Systems: Windows, Unix, Linux

Version Control: Git, SVN, Rational ClearCase

Development methodology: Agile, Scrum, Waterfall

Hadoop Distributions: Cloudera, Hortonworks, MapR, AWS, Databricks

PROFESSIONAL EXPERIENCE:

Confidential, NJ

Big Data Engineer

Responsibilities:

  • Design, Develop, Test, Deploy and Support Big Data Applications on EC2 Hadoop Cluster in Amazon Web Services (AWS) cloud.
  • Responsible for gathering and managing all requirements for the project.
  • Analyze and ensure all data sources are productionalized and automate as much as possible and ingest into the platform.
  • Responsible for spinning up Cloudera EC2 cluster and scale up/down the nodes according to the requirement.
  • Monitor and present the graphs for cost estimates for the nodes utilized for the project in AWS per instance type.
  • Assign performance related tuning at default hive/spark settings.
  • Deploy and trigger the Spark jobs in Databricks environment with different instance type in DBFS and AWS S3.
  • Workflow orchestration to automate successive steps and in corporate appropriate quality checks within the process.
  • Automate all jobs by pulling the data from File Transfer Protocol server using workflows into desired destination as per requirement via FTP/SFTP or STP or mainframe deliverables.
  • Generate audit reports and metadata in the desired format by ensuring all assets code in source control by following best-in-class release management and source code practices.
  • Perform code review, bug fixing and production activity when required.
  • Perform controlled releases to implement any code/asset change for standardized delivery to various end customers.
  • Develop and document design/implementation of requirements based on business needs.
  • Use JIRA board for tracking the tickets.
  • Use Confluence for bug tracking and documentation updates.
  • Use Bitbucket as version control tool.

Environment: Linux, Hadoop, Hive, Oozie, Spark, Python, Scala, Java, GIT, CDH5, CDH6, Databricks.

Confidential, Phoenix, AZ

Software developer / Big Data

Responsibilities:

  • Worked on large-scale Hadoop YARN cluster (600+ nodes) for distributed data Storage, processing and analysis.
  • Design, Develop, Test, Deploy and Support Big Data Applications on Hadoop Cluster with Spark, Map Reduce (Java), Sqoop, HBase, Hive, OOZIE
  • Responsible for gathering all required information and requirements for the project.
  • Collect, aggregate, and move data from servers to HDFS using Apache Spark.
  • Improved performances of different Spark jobs and Hive jobs, and tuned the jobs in an efficient manner
  • Implemented Spark RDD transformations and performed actions to implement business analysis.
  • Extensively worked withSparkData frames for ingesting data from flat files into RDD's to transform Unstructured data in Structured data.
  • Used Magellan tool to write hive queries.
  • Involved in encrypting the sensitive data like card number, acct number etc. on more than 100 million records.
  • Stored, Accessed and Processed data from different file formats i.e., Text, ORC and Parquet.
  • Developed UDF's to implement complex transformations on Hadoop.
  • Supported Map Reduce Programs those are running on the cluster.
  • Managing and reviewing Hadoop log files to identify bugs.
  • Scheduling Oozie workflow and Spring Batch for configuring the workflow for different jobs like Hive, MapReduce, Shell
  • Written Shell scripts to start the use-case and for pre validations.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
  • Monitored and scheduled the UNIX scripting jobs.
  • Actively involved in code review and bug fixing for improving the performance
  • Developed and optimized Hive UDFs (User-Defined Functions) to implement the functionality of external languages as and when required.
  • Managed, reviewed and interpreted Hadoop log files.
  • Co-ordinate with Administrator team to analyze Map Reduce Jobs performance for resolving any cluster related issues.
  • Expertise in platform related Hadoop Production support tasks by analyzing the job logs.
  • Co-ordinate with different teams to determine the root cause and taking steps to resolve them.
  • Managed and reviewed Hadoop log files to identify issues when job fails and finding out the root cause.
  • Utilizing service now to provide application support for the existing clients.
  • Created partitioned tables in Hive for best performance and faster querying.
  • Managing and scheduling Jobs on a Hadoop cluster using Cron Tab, Oozie Scheduler, Event Engine (Inhouse Tool)
  • Ensure the quality and integrity of data are maintained as design and functional requirements change.
  • Develop and document design/implementation impacts based on system monitoring reports.
  • Reduced number of open tickets for couple of use-cases by analyzing, categorizing & prioritizing all open issues recorded. This required motivation of the offshore team & thoughtful delegation of workable tickets to the offshore resources to maximize on efficiency.
  • Reduced response time to clients on issues reported in the production environment.
  • Drafted support procedures to decrease client reported ticket/issue turnaround.
  • Used the Service Now ITSM tool for creating Incidents, Problems, Knowledge Articles, Change Requests.
  • Presented weekly status reports to client on use-case progress and issue tracker.
  • Guide offshore programmers assigned to the Production Support group.
  • Conduct code reviews to maintain quality of code.

Environment: Linux, Hadoop, Hive, HBase, GIT, Spark, Scala, Map Reduce, Sqoop.

Confidential

Software Engineer

Responsibilities:

  • Developed Map Reduce pipeline jobs to process the data and create necessary Files in Hadoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implementation of MapReduce-based large-scale parallel relation-learning system.
  • Imported data using Sqoop to load data from MySQL and Oracle to HDFS.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Used Zookeeper for providing coordinating services to the cluster.
  • Developed the application using MVC Architecture using JSP, Servlet.
  • Involved in developing Class diagrams, Sequence Diagrams using UML.
  • Designed various interactive front-end web pages using HTML, CSS, Jquery & Bootstrap.
  • Developed HTML and JSP pages for user interaction and data presentation.
  • Developed JSPs and Servlets to implement the business logic and use java beans to retrieve the data.
  • Have Implemented Spring Batch module for achieving batch Transactions.
  • Designed and developed moderately complex units/modules/products that meet requirements.
  • Maintenance and upgrades (new features, refactoring, bug fixing) of existing programs using MFC and Win32 API wherever needed.
  • Performed unit/module testing of software to find errors to make sure programs meet specifications.
  • Documented the work we did and the solutions we built.
  • Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
  • Put forward accurate time estimates of work to be done on a project.
  • Collaborated with quality assurance team in creation of test plans and participated in reviews.
  • Participate in design and code reviews with other developers.
  • Worked in small scrum teams in an agile development environment.
  • Used Jira as issue tracking system Jira setup.

We'd love your feedback!