Big Data Engineer Resume NJ - Hire IT People

PROFESSIONAL SUMMARY:

More than 7 years of experience with strong emphasis on developing, implementing, configuring, testingHadoopecosystem components and maintenance of various web - based applications.
In depth working knowledge in of Distributed Systems Architecture and Parallel Processing Frameworks
Strong knowledge on full Software Development life cycle -Software analysis, design, architecture, development and maintenance.
Experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Big Data Ecosystem including Apache Spark, Spark SQL.
Experience in HDFS, Map Reduce, Pig, Hive, YARN, HBase, Sqoop, Zookeeper and Oozie.
Experience in analyzing data using Hive, Pig Latin, HBase and custom Map Reduce program.
Worked on writing custom UDFs in java for Hive and Pig and SERDE's for Hive.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Experienced using Sqoop to import data into HDFS/Hive from RDBMS and exporting data back from HDFS/Hive to RDBMS.
Involved in batch job scheduling workflow using Oozie and Event Engine
Good Knowledge on Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
Good Knowledge on serialization formats like Sequence File, Avro, Parquet, ORC.
Experience in working with different data sources like Flat files, XML files and Databases.
Expertise in writing queries on Splunk tool to see the history data and trending.
Experience in working with different Hadoop distributions like MapR, Cloudera (CDH 5 and CDH 6) and Databricks
Experience in AWS cloud environment and on S3 storage and EC2 instances.
Strong experience in implementing CI/CD (continuous integration and continuous deployment) and test-driven development using Jenkins.
Expertise in using the Service Now tool for creating Incidents, Problems, Knowledge Articles, Change Requests.
Hand on experience on UNIX scripting.
Strong work ethic with desire to succeed and make significant contributions to the organization
Strong problem-solving skills, good communication, interpersonal skills and a good team player
Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
Presented multiple Show and Tell sessions to client on Hadoop use-cases.
Good understanding and experience with Software Development methodologies like Agile and Waterfall.
Expertise in using Version Control systems like SVN, Jenkins, GIT HUB.

TECHNICAL SKILLS:

Programming Languages: Python, Scala, Java, C, C++, SQL.

Big Data technologies: SparkCore,SparkSQL, Hadoop, MapReduce, Pig, Hive, YARN, Sqoop, Oozie, Event Engine, Flume, Zookeeper, Kafka, Elastic Search.

Scripting Languages: Shell Script, Python.

Java Technologies: JDBC, JSP, JSON, Servlets, JUnit, REST, Web Services

Databases: Oracle, MySQL, PostGRE SQL

NoSQL Databases: HBase, Cornerstone

Application Servers: WebLogic, WebSphere, Tomcat

IDEs: Eclipse, IntelliJ, Toad, SQL Developer

Operating Systems: Windows, Unix, Linux

Version Control: Git, SVN, Rational ClearCase

Development methodology: Agile, Scrum, Waterfall

Hadoop Distributions: Cloudera, Hortonworks, MapR, AWS, Databricks

PROFESSIONAL EXPERIENCE:

Confidential, NJ

Big Data Engineer

Responsibilities:

Design, Develop, Test, Deploy and Support Big Data Applications on EC2 Hadoop Cluster in Amazon Web Services (AWS) cloud.
Responsible for gathering and managing all requirements for the project.
Analyze and ensure all data sources are productionalized and automate as much as possible and ingest into the platform.
Responsible for spinning up Cloudera EC2 cluster and scale up/down the nodes according to the requirement.
Monitor and present the graphs for cost estimates for the nodes utilized for the project in AWS per instance type.
Assign performance related tuning at default hive/spark settings.
Deploy and trigger the Spark jobs in Databricks environment with different instance type in DBFS and AWS S3.
Workflow orchestration to automate successive steps and in corporate appropriate quality checks within the process.
Automate all jobs by pulling the data from File Transfer Protocol server using workflows into desired destination as per requirement via FTP/SFTP or STP or mainframe deliverables.
Generate audit reports and metadata in the desired format by ensuring all assets code in source control by following best-in-class release management and source code practices.
Perform code review, bug fixing and production activity when required.
Perform controlled releases to implement any code/asset change for standardized delivery to various end customers.
Develop and document design/implementation of requirements based on business needs.
Use JIRA board for tracking the tickets.
Use Confluence for bug tracking and documentation updates.
Use Bitbucket as version control tool.

Environment: Linux, Hadoop, Hive, Oozie, Spark, Python, Scala, Java, GIT, CDH5, CDH6, Databricks.

Confidential, Phoenix, AZ

Software developer / Big Data

Responsibilities:

Worked on large-scale Hadoop YARN cluster (600+ nodes) for distributed data Storage, processing and analysis.
Design, Develop, Test, Deploy and Support Big Data Applications on Hadoop Cluster with Spark, Map Reduce (Java), Sqoop, HBase, Hive, OOZIE
Responsible for gathering all required information and requirements for the project.
Collect, aggregate, and move data from servers to HDFS using Apache Spark.
Improved performances of different Spark jobs and Hive jobs, and tuned the jobs in an efficient manner
Implemented Spark RDD transformations and performed actions to implement business analysis.
Extensively worked withSparkData frames for ingesting data from flat files into RDD's to transform Unstructured data in Structured data.
Used Magellan tool to write hive queries.
Involved in encrypting the sensitive data like card number, acct number etc. on more than 100 million records.
Stored, Accessed and Processed data from different file formats i.e., Text, ORC and Parquet.
Developed UDF's to implement complex transformations on Hadoop.
Supported Map Reduce Programs those are running on the cluster.
Managing and reviewing Hadoop log files to identify bugs.
Scheduling Oozie workflow and Spring Batch for configuring the workflow for different jobs like Hive, MapReduce, Shell
Written Shell scripts to start the use-case and for pre validations.
Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
Monitored and scheduled the UNIX scripting jobs.
Actively involved in code review and bug fixing for improving the performance
Developed and optimized Hive UDFs (User-Defined Functions) to implement the functionality of external languages as and when required.
Managed, reviewed and interpreted Hadoop log files.
Co-ordinate with Administrator team to analyze Map Reduce Jobs performance for resolving any cluster related issues.
Expertise in platform related Hadoop Production support tasks by analyzing the job logs.
Co-ordinate with different teams to determine the root cause and taking steps to resolve them.
Managed and reviewed Hadoop log files to identify issues when job fails and finding out the root cause.
Utilizing service now to provide application support for the existing clients.
Created partitioned tables in Hive for best performance and faster querying.
Managing and scheduling Jobs on a Hadoop cluster using Cron Tab, Oozie Scheduler, Event Engine (Inhouse Tool)
Ensure the quality and integrity of data are maintained as design and functional requirements change.
Develop and document design/implementation impacts based on system monitoring reports.
Reduced number of open tickets for couple of use-cases by analyzing, categorizing & prioritizing all open issues recorded. This required motivation of the offshore team & thoughtful delegation of workable tickets to the offshore resources to maximize on efficiency.
Reduced response time to clients on issues reported in the production environment.
Drafted support procedures to decrease client reported ticket/issue turnaround.
Used the Service Now ITSM tool for creating Incidents, Problems, Knowledge Articles, Change Requests.
Presented weekly status reports to client on use-case progress and issue tracker.
Guide offshore programmers assigned to the Production Support group.
Conduct code reviews to maintain quality of code.

Environment: Linux, Hadoop, Hive, HBase, GIT, Spark, Scala, Map Reduce, Sqoop.

Confidential

Software Engineer

Responsibilities:

Developed Map Reduce pipeline jobs to process the data and create necessary Files in Hadoop.
Responsible for building scalable distributed data solutions using Hadoop.
Implementation of MapReduce-based large-scale parallel relation-learning system.
Imported data using Sqoop to load data from MySQL and Oracle to HDFS.
Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Used Zookeeper for providing coordinating services to the cluster.
Developed the application using MVC Architecture using JSP, Servlet.
Involved in developing Class diagrams, Sequence Diagrams using UML.
Designed various interactive front-end web pages using HTML, CSS, Jquery & Bootstrap.
Developed HTML and JSP pages for user interaction and data presentation.
Developed JSPs and Servlets to implement the business logic and use java beans to retrieve the data.
Have Implemented Spring Batch module for achieving batch Transactions.
Designed and developed moderately complex units/modules/products that meet requirements.
Maintenance and upgrades (new features, refactoring, bug fixing) of existing programs using MFC and Win32 API wherever needed.
Performed unit/module testing of software to find errors to make sure programs meet specifications.
Documented the work we did and the solutions we built.
Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
Put forward accurate time estimates of work to be done on a project.
Collaborated with quality assurance team in creation of test plans and participated in reviews.
Participate in design and code reviews with other developers.
Worked in small scrum teams in an agile development environment.
Used Jira as issue tracking system Jira setup.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship