Lead Hadoop Engineer Resume
Princeton, NJ
PROFESSIONAL SUMMARY:
- Strong decision - maker with more than 19 years of experience in software engineering.
- Considered an effective coach and mentor and committed to leading development and administrative teams.
- Looking to work closely with both teams and customers to find the most efficient and beneficial solutions for process improvement.
- Self-directed and motivated Senior Software Engineer works effectively in dynamic environments.
- Versatile Senior Manager specializing in Hadoop development and administration and skilled at planning, implementing and overseeing key improvements to drive business growth and efficiency.
- History of cultivating an open culture with free exchange of information.
- Pursuing new professional challenges with a growth-oriented company.
SKILL:
Computer Skills: ETL, SQL, PL/SQL, MS SQL, MySQL, HTML, Perl, Python, Shell scripting, Java, Hadoop administration, AWS, GoogleCloud, Spark / Scala, PySpark, Pig, Hive, HBase, Storm, Kafka, Sqoop, Oozie, Flume, Knox, Ranger, Confidential, Ambari Views
Operating Systems: UNIX (Sun and HP), Linux (RedHat), CentOS, Ubuntu, Windows OS, Apple OS
Development Software: Informatica, Cognos, Business Objects, Xcelsius, Object Team,Adobe, TOAD (for Hadoop), SAP Business Objects Web Intelligent tools, RStudio, AtScale, Druid, Superset, Tableau, Microsoft Visio, Microsoft Office software and SPSS. Familiar with Agile, JIRA, Remedy, Confluence, SVN and GIT.
PERSONAL SKILLS:
- Enthusiastic and flexible with new open source software methodologies and frameworks
- Strong verbal communication
- Data analysis
- Data management
- Methodology implementation
- Interpersonal and written communication
- Self-motivated
- Debugging proficiency
- Code validation skills
- Process implementation
WORK HISTORY:
Lead Hadoop Engineer
Confidential, Princeton, NJ
Responsibilities:
- Responsible for implementation, support, and management of the enterprise Hadoop analytic data lake environments.
- Create run book schedules for nightly jobs to meet business data load requirements.
- Produce daily cluster reports and documents for senior team members.
- Worked directly with departments, clients, management, and vendors to achieve project goals.
- Assisted various business groups with document organization and dissemination during software deployment.
- Supported Chief Operating Officer with daily operational functions.
- Support Senior Data Architect with data consolidation efforts.
- Support Senior IT Vice President with budget planning for Hadoop clusters
- Attend weekly change review meetings to discuss cluster planned configuration changes to improve performance or correct defects.
- Create and implement scripts to manage cluster logs.
- Present cluster engineering plans to senior management
- Involves designing, capacity arrangement, cluster set up, performance fine-tuning, monitoring, structure planning, scaling and administration.
- Troubleshooting memory issues and utilization aspects of the clusters.
- Administer and maintain platforms - such as Syncsort and AtScale.
- Perform ongoing implementations, administration, maintenance of the platforms.
- Document and formalize processes for the environments.
- Document processes for the administration of Hadoop platforms.
- Complete understanding of all application in Hadoop ecosystem.
- Install and maintain Apache NiFi clusters
- Implement shell scripts to maintain logs rotation on several application levels and for different platform configurations.
- Implement shell scripts to provide daily health reports and service monitoring on the cluster.
- Install AtScale development software
- Implement APIs to schedule AtScale relational cube builds
- Engineer and develop run books for scheduled AtScale jobs execution.
- Create shell script to access APIs to implement scheduled run of AtScale jobs.
- Integrate with development and managements teams to meet project goals.
- Collaborate with senior management to strategize cloud to on premise integration of Hadoop and Apache NiFi clusters.
Sr Data Engineer, Senior Staff
Confidential, Harrison, NY
Responsibilities:
- Manage pre-prod and production clusters using Cloudera Manager to quickly identify failures and resource issues.
- Contribute to the vision, deployment and administration of Hadoop enabled infrastructure as a core service to all business functions within the organization.
- Build data expertise and own data quality for ingestion pipelines.
- Interface with engineers, product managers and product analysts to understand data needs.
- Architect, build and launch new data models that provide intuitive analytics to end users.
- Design, build and launch extremely efficient & reliable data pipelines to move data (both large and small amounts) to the Data Warehouses.
- Design, build and launch new data extraction, transformation and loading processes in production.
- Create new systems and tools to enable the customer to consume and understand data faster.
- Apply coding skills across a number of languages from SQL, Python, PySpark, and Java.
- Work across multiple teams in high visibility roles and own the solution end-to-end.
- Support the administration of on premise and cloud-based Hadoop clusters.
- Make information available to large scale, next generation, predictive analytics applications.
- Support and integrate big data tools/frameworks.
- Work with, to build, implement and support the data infrastructure; ingest and transform data (ETL/ELT process).
- Be available on call (on rotation) in a support role.
- Manages and participates in the day to day operational work across the Hadoop clusters.
- Works closely with the hosted operations colleagues to define operational best practice for the UAT and Prod Hadoop clusters.
- Participates in project planning and review as they pertain to Hadoop and Hadoop clusters.
- Main point of contact for vendor escalation on open issues related to the cluster maintenance, performance and upgrade.
- Participate in interviewing candidates to fill open Hadoop development and administration.
- Verified data integrity and accuracy.
- Validated schematic designs working alongside Hadoop data engineers.
- Work closely with off shore teams to meet project objectives and cluster maintenance support.
Sr Hadoop Administrator
Confidential, Tucson, AZ
Responsibilities:
- Install peripheral tools such as Confidential, RStudio and OpenLDAP.
- Manage users on the Hadoop platform.
- Encrypt data at rest and in motion.
- Serve as an escalation point for the Think Big Managed Services teams that provide on-going operational support for Think Big's customers.
- Collaborate with Think Big's project leads to develop project level scoping plan.
- Research tools to accommodate customer requirements.
- Develop test plans for initial Hadoop services component testing.
- Develop POC application projects using Hive, Pig, PySpark, and Oozie.
- Perform system administration tasks to include using Ambari to install and provision Hadoop clusters, onboard users to the Hadoop cluster, setup High Availability (HA) for key components in Hadoop such as the Namenode, Resource Manager, and any other component identified as being critical to the customer or use cases.
- Work with the Think Big's operations practice leads to develop and ensure consistent deployment of best practices across all of Think Big's projects.
- Design and develop tools to support proactive administration and monitoring of the open source Hadoop Big Data platform.
- Support sales efforts scoping engagements and developing statements of work.
- Apply tuning to components such as Hive, HBase and Spark to enhance performance.
- Familiar with the use of Apache Ranger for user authorization, access, and for auditing on Hortonworks data platform and the use of Apache Knox for Hadoop perimeter gateway access.
- Use Linux shell scripting where necessary to automate tasks or to fill gaps where tools cannot perform the tasks.
- Familiar with Hadoop available streaming components such as Kafka, Spark, Storm and Flunk.
- Integrate with client development team to develop applications to develop data pipelines based on specific use case scenarios.
- Familiar with cloud computing environments such as AWS and Google cloud environments.
- Collaborate with systems administrators and architects when necessary to perform system designs.
- Familiar with Virtual Machine tools such as VMWare and VirtualBox.
- Worked with clients on both Hortonworks and Cloudera Hadoop distributions.
- Using both Ambari and Cloudera Manager to install and manage clusters.
Lead Hadoop / Data Engineer
Confidential, Tucson, AZ
Responsibilities:
- Lead Hadoop Engineer for the implementation and maintenance of Hortonworks Data Platform (HDP 2.1 and HDP 2.3) for Hadoop clusters for the data lake implementtion.
- Perform cluster implementation and continuing administration of the Hadoop ecosystem using tools such as Ambari, Ganglia and, Nagios.
- Align with software developers, database modelers, systems architects, data scientists, Linux system admins and enterprise security teams to meet enterprise requirements, project deadlines and POC use cases.
- Develop Spark application performing data extraction and manipulation for user data consumption.
- Implement Hive schemas / tables for data ingestion and user consumption.
- Perform initial testing of Hadoop services such as Kafka, Hive, Pig, HBase, and Flume.
- Support all development teams using the Hadoop cluster.
- Investigate, integrate and evaluate open source tools to lower costs, promote development proficiency and streamline production integration.
- Perform research on development environments to increase development efficiency.
- Integrate enterprise Active Directory and LDAP users into the ecosystem.
- Define implement and enforce security policies for users and services using tools such as Ranger, Knox and Kerberos.
- Define HDFS directory structure for development teams, data scientists, and research teams.
- Perform continuous performance tuning, screen cluster performance and capacity planning using tools such as the Resource Manager and Job Tracker.
- Manage and review log files and automate the removal of unnecessary logs using shell scripts / cron jobs.
- Communicate effectively with team members and users on maintenance, patches and, version upgrades when necessary.
- Document and distribute cluster layout for user reference.
- Prioritize and manage service issues.
- Main point of contact for vendor escalation on open issues related to the cluster maintenance, performance and upgrade.
Sr. Information Systems Technologist
Confidential, Tucson, AZ
Responsibilities:
- Demonstrate software functionality and describe data results to senior management.
- Create executive dashboards and operational detailed drilled down reports.
- Interrogate databases and develop scripts to meet business requirements.
- Develop SAP Business Objects front end solutions.
- Familiar with relational database and data warehouse concepts such as star schemas, dimensions and data marts.
- Develop DDL and DML scripts using client tool such as TOAD, Oracle SQL client to define fact tables and dimension tables and to load data marts.
- Created to SQL scripts to and perform data analysis and aggregation validation.
- Develop bash, Perl, and Python scripts to perform data operations and conversions when needed on raw files.
- Use Informatica to perform ETL operations such as transformations, filter, joins, and merge on source files, raw files and systems.
- Develop enterprise business intelligence dashboards reports and customized reports for internal business units using Cognos and Business Objects software.
- Create underlying database design, such as DMR, OLAP and ROLAP, using Cognos framework manager to support business intelligence reports.
- Implementation and administration of Simplified Sign On (SSO), on Apache web servers.
- Develop web-based change request system using Apache and embperl to gather information on request for change and assembling the correct change control board based on the change required and the skill set required for the change.
- Schedule, conduct and present regular tool development update meeting to stakeholders.
- Assess risk and management involved with tool development and integration.
- Develop software tools using Java and Python for enterprise data management.
- Create software install notes for reference and tool maintenance documentation.
- Create and update technical documentation.
- Create user-training manual and provide training on tools created.
- Participate in full life - cycle software development.
Linux System Administrator
Confidential, Garland, TX
Responsibilities:
- Perform day to day administration of Linux servers.
- Perform server upgrades; install standard and other required software to include Apache, MySQL and PHP, web server configuration, plugins, SSL and configuration.
- Perform onboarding of new server users, creating user home directories and user privileges.
- Perform setup ACL schemas for project support and other data project requirements.
- Good knowledge of Linux operating systems and administration, understanding of system's capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks.
- Be available on call (on rotation).