Big Data Engineer Resume
NJ
PROFESSIONAL SUMMARY:
- More than 7 years of experience with strong emphasis on developing, implementing, configuring, testingHadoopecosystem components and maintenance of various web - based applications.
- In depth working knowledge in of Distributed Systems Architecture and Parallel Processing Frameworks
- Strong knowledge on full Software Development life cycle -Software analysis, design, architecture, development and maintenance.
- Experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Big Data Ecosystem including Apache Spark, Spark SQL.
- Experience in HDFS, Map Reduce, Pig, Hive, YARN, HBase, Sqoop, Zookeeper and Oozie.
- Experience in analyzing data using Hive, Pig Latin, HBase and custom Map Reduce program.
- Worked on writing custom UDFs in java for Hive and Pig and SERDE's for Hive.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Experienced using Sqoop to import data into HDFS/Hive from RDBMS and exporting data back from HDFS/Hive to RDBMS.
- Involved in batch job scheduling workflow using Oozie and Event Engine
- Good Knowledge on Kafka in understanding and performing thousands of megabytes of reads and writes per second on streaming data.
- Good Knowledge on serialization formats like Sequence File, Avro, Parquet, ORC.
- Experience in working with different data sources like Flat files, XML files and Databases.
- Expertise in writing queries on Splunk tool to see the history data and trending.
- Experience in working with different Hadoop distributions like MapR, Cloudera (CDH 5 and CDH 6) and Databricks
- Experience in AWS cloud environment and on S3 storage and EC2 instances.
- Strong experience in implementing CI/CD (continuous integration and continuous deployment) and test-driven development using Jenkins.
- Expertise in using the Service Now tool for creating Incidents, Problems, Knowledge Articles, Change Requests.
- Hand on experience on UNIX scripting.
- Strong work ethic with desire to succeed and make significant contributions to the organization
- Strong problem-solving skills, good communication, interpersonal skills and a good team player
- Have the motivation to take independent responsibility as well as ability to contribute and be a productive team member.
- Presented multiple Show and Tell sessions to client on Hadoop use-cases.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
- Expertise in using Version Control systems like SVN, Jenkins, GIT HUB.
TECHNICAL SKILLS:
Programming Languages: Python, Scala, Java, C, C++, SQL.
Big Data technologies: SparkCore,SparkSQL, Hadoop, MapReduce, Pig, Hive, YARN, Sqoop, Oozie, Event Engine, Flume, Zookeeper, Kafka, Elastic Search.
Scripting Languages: Shell Script, Python.
Java Technologies: JDBC, JSP, JSON, Servlets, JUnit, REST, Web Services
Databases: Oracle, MySQL, PostGRE SQL
NoSQL Databases: HBase, Cornerstone
Application Servers: WebLogic, WebSphere, Tomcat
IDEs: Eclipse, IntelliJ, Toad, SQL Developer
Operating Systems: Windows, Unix, Linux
Version Control: Git, SVN, Rational ClearCase
Development methodology: Agile, Scrum, Waterfall
Hadoop Distributions: Cloudera, Hortonworks, MapR, AWS, Databricks
PROFESSIONAL EXPERIENCE:
Confidential, NJ
Big Data Engineer
Responsibilities:
- Design, Develop, Test, Deploy and Support Big Data Applications on EC2 Hadoop Cluster in Amazon Web Services (AWS) cloud.
- Responsible for gathering and managing all requirements for the project.
- Analyze and ensure all data sources are productionalized and automate as much as possible and ingest into the platform.
- Responsible for spinning up Cloudera EC2 cluster and scale up/down the nodes according to the requirement.
- Monitor and present the graphs for cost estimates for the nodes utilized for the project in AWS per instance type.
- Assign performance related tuning at default hive/spark settings.
- Deploy and trigger the Spark jobs in Databricks environment with different instance type in DBFS and AWS S3.
- Workflow orchestration to automate successive steps and in corporate appropriate quality checks within the process.
- Automate all jobs by pulling the data from File Transfer Protocol server using workflows into desired destination as per requirement via FTP/SFTP or STP or mainframe deliverables.
- Generate audit reports and metadata in the desired format by ensuring all assets code in source control by following best-in-class release management and source code practices.
- Perform code review, bug fixing and production activity when required.
- Perform controlled releases to implement any code/asset change for standardized delivery to various end customers.
- Develop and document design/implementation of requirements based on business needs.
- Use JIRA board for tracking the tickets.
- Use Confluence for bug tracking and documentation updates.
- Use Bitbucket as version control tool.
Environment: Linux, Hadoop, Hive, Oozie, Spark, Python, Scala, Java, GIT, CDH5, CDH6, Databricks.
Confidential, Phoenix, AZ
Software developer / Big Data
Responsibilities:
- Worked on large-scale Hadoop YARN cluster (600+ nodes) for distributed data Storage, processing and analysis.
- Design, Develop, Test, Deploy and Support Big Data Applications on Hadoop Cluster with Spark, Map Reduce (Java), Sqoop, HBase, Hive, OOZIE
- Responsible for gathering all required information and requirements for the project.
- Collect, aggregate, and move data from servers to HDFS using Apache Spark.
- Improved performances of different Spark jobs and Hive jobs, and tuned the jobs in an efficient manner
- Implemented Spark RDD transformations and performed actions to implement business analysis.
- Extensively worked withSparkData frames for ingesting data from flat files into RDD's to transform Unstructured data in Structured data.
- Used Magellan tool to write hive queries.
- Involved in encrypting the sensitive data like card number, acct number etc. on more than 100 million records.
- Stored, Accessed and Processed data from different file formats i.e., Text, ORC and Parquet.
- Developed UDF's to implement complex transformations on Hadoop.
- Supported Map Reduce Programs those are running on the cluster.
- Managing and reviewing Hadoop log files to identify bugs.
- Scheduling Oozie workflow and Spring Batch for configuring the workflow for different jobs like Hive, MapReduce, Shell
- Written Shell scripts to start the use-case and for pre validations.
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
- Monitored and scheduled the UNIX scripting jobs.
- Actively involved in code review and bug fixing for improving the performance
- Developed and optimized Hive UDFs (User-Defined Functions) to implement the functionality of external languages as and when required.
- Managed, reviewed and interpreted Hadoop log files.
- Co-ordinate with Administrator team to analyze Map Reduce Jobs performance for resolving any cluster related issues.
- Expertise in platform related Hadoop Production support tasks by analyzing the job logs.
- Co-ordinate with different teams to determine the root cause and taking steps to resolve them.
- Managed and reviewed Hadoop log files to identify issues when job fails and finding out the root cause.
- Utilizing service now to provide application support for the existing clients.
- Created partitioned tables in Hive for best performance and faster querying.
- Managing and scheduling Jobs on a Hadoop cluster using Cron Tab, Oozie Scheduler, Event Engine (Inhouse Tool)
- Ensure the quality and integrity of data are maintained as design and functional requirements change.
- Develop and document design/implementation impacts based on system monitoring reports.
- Reduced number of open tickets for couple of use-cases by analyzing, categorizing & prioritizing all open issues recorded. This required motivation of the offshore team & thoughtful delegation of workable tickets to the offshore resources to maximize on efficiency.
- Reduced response time to clients on issues reported in the production environment.
- Drafted support procedures to decrease client reported ticket/issue turnaround.
- Used the Service Now ITSM tool for creating Incidents, Problems, Knowledge Articles, Change Requests.
- Presented weekly status reports to client on use-case progress and issue tracker.
- Guide offshore programmers assigned to the Production Support group.
- Conduct code reviews to maintain quality of code.
Environment: Linux, Hadoop, Hive, HBase, GIT, Spark, Scala, Map Reduce, Sqoop.
Confidential
Software Engineer
Responsibilities:
- Developed Map Reduce pipeline jobs to process the data and create necessary Files in Hadoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implementation of MapReduce-based large-scale parallel relation-learning system.
- Imported data using Sqoop to load data from MySQL and Oracle to HDFS.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Zookeeper for providing coordinating services to the cluster.
- Developed the application using MVC Architecture using JSP, Servlet.
- Involved in developing Class diagrams, Sequence Diagrams using UML.
- Designed various interactive front-end web pages using HTML, CSS, Jquery & Bootstrap.
- Developed HTML and JSP pages for user interaction and data presentation.
- Developed JSPs and Servlets to implement the business logic and use java beans to retrieve the data.
- Have Implemented Spring Batch module for achieving batch Transactions.
- Designed and developed moderately complex units/modules/products that meet requirements.
- Maintenance and upgrades (new features, refactoring, bug fixing) of existing programs using MFC and Win32 API wherever needed.
- Performed unit/module testing of software to find errors to make sure programs meet specifications.
- Documented the work we did and the solutions we built.
- Involved in Unit Testing, User Acceptance Testing and Bug Fixing.
- Put forward accurate time estimates of work to be done on a project.
- Collaborated with quality assurance team in creation of test plans and participated in reviews.
- Participate in design and code reviews with other developers.
- Worked in small scrum teams in an agile development environment.
- Used Jira as issue tracking system Jira setup.