Sr. Big Data Engineer Resume
Bloomfield, CT
SUMMARY:
- 10 years of professional IT experience.
- About 3.5 years of experience in deploying, maintaining, monitoring and upgrading Hadoop Clusters (Apache Hadoop, Cloudera, and Hortonworks).
- 6 years of Experience in DW (Data Warehouse) Tools & BI (Business Intelligence) and experience working with Linux Systems.
- Innovative and self - directed individual with strong interpersonal skills, highly adaptable and quick to learn and adept at analyzing situations and taking initiative to solve problems.
TECHNICAL SKILLS:
Hadoop Ecosystem: HDFS, Map Reduce, YARN, Hive, Pig, Flume, Zookeeper, Sqoop, OozieStorm, Spark, Solr, Impala, CDH& HDP Distros.
Security Systems: MIT Kerberos, Apache Ranger, Apache Sentry
Mointoring Tools: Check mk, Nagios, Ganglia
Operating Systems: Linux (Red hat, CentOS, Ubuntu), Windows (7, Vista, XP, 2003).
Languages: Python, PIG, SQL, PL/SQL, T-SQL, C, Core Java, Java Scripting, UNIX shell scripting, HTML, XML
ETL Tools: Talend, Infosphere Datastage 7.x/8.x, SSIS
BI Tools/Analytics: Tableau, SSRS, OBIEE
Databases: Oracle, MySQL, SQL Server, Teradata, HBASE, Open TSDB, Kairos dB
Automation Tools: Puppet, Chef
WORK EXPERIENCE:
Sr. Big Data Engineer
Confidential, Bloomfield, CT
Responsibilities:
- Build/install, configure & manage of hadoop cluster using cloudera distribution.
- Design, install, and maintain highly available systems (including monitoring, security, backup, and performance tuning).
- Integration of various third-party technologies like SAP, tableau, tools like kyvos, atscale with Hadoop environment.
- Analyzed system failures, identifying root causes, and recommended course of actions.
- Implemented commissioning and decommissioning of data nodes.
- Experienced in real-time data ingestion into HIVE using Spark.
- Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Expertise in debugging user requests and issues and resolving them in a timely manner.
- Administration of PostgreSQL, HBASE and other NoSQL databases.
- Administrating user access to various applications like Hue.
- Managed add-on services such as custom service descriptors.
- Lead cloudera upgrades.
- Monitored workload, job performance and capacity planning for clusters.
- 24/7 Production support.
Environment: Cloudera CDH 5.7, Cloudera Manager, Kerberos, Sentry, Postgresql, CentoOS, MapReduce, HDFS, Spark, Pig, Hive, Python, Sqoop, Hbase, Zookeeper, Oozie, Hue, Kyvos, atscale.
Big Data Engineer
Confidential, Pittsburgh, PA
Responsibilities:
- Provided Architectural design of Hardware configuration and deployment diagrams.
- Configured Fully Distributed Hadoop cluster using bare metal Apache software.
- Introduced Hortonworks Platform Hadoop to the company, and implemented it from scratch and documented the process.
- Expertise in integration of different tools in the Apache Hadoop Stack including MapReduce, Hive, Pig, Sqoop, Hbase, Zookeeper and Oozie.
- Provided Developers suggestions on how they could use the Hadoop cluster right from ingestion from MySQL for incremental loads using sqoop metastore, also presented them a use case on how they could use Storm/Kafka tools as part of the stack.
- Implemented Security, Kerberos Authentication and also introduced Authorization with Ranger.
- Performed Manual Benchmarking tests to test the performance of the Hadoop cluster.
- Implemented Ganglia/Nagios monitoring setup on Apache Cluster.
- Performed tuning of configuration parameters of Yarn and MapReduce, Hiveserver2, Hive metastore.
- Implemented few Automation scripts for cleaning disk space as well as backing up databases.
- Worked on installation of MySQL RDBMS database replica.
- Worked closely with DevOps and gave requirements for file system layout and OS requirements and also kernel level parameter changes.
- Trained other team members by documenting the installation process.
Environment: Apache Hadoop 2.6.0, Apache, Ambari-2.1.1, HDP-2.3.0, MIT Kerberos, Ranger, Mysql, CentoOS 6.6, MapReduce, HDFS, Pig, Hive-1.4, Sqoop, HBase-0.98, Zookeeper, Oozie, Tez.
Big Data Consultant
Confidential, NC
Responsibilities:
- Support & Maintain Existing Hadoop Clusters, Fine Tune Configurations for optimum utilization of Cluster Resources.
- Performed Cloudera upgrades.
- Managed OS configuration with Puppet.
- Performed Hadoop performance metrics and tuning along with monitoring tools like check mk, icinga.
- Imported data frequently from Multiple RDBMS sources to HDFS using SQOOP
- Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades.
- Monitored and troubleshoot Hadoop clusters using Ganglia and Nagios.
- Managed and reviewed Hadoop log files.
- Supported MapReduce Programs and distributed applications running on the Hadoop cluster.
- Successfully loaded files to hive and HDFS from LAMP servers.
- Prepared multi-cluster test harness to exercise the system for performance, failover and upgrades.
- Ensured data integrity using ‘fsck’ and another Hadoop system admin tools for block corruption.
- Benchmarked and tuned Hadoop cluster for performance.
- Experience in providing security for Hadoop Cluster with Kerberos.
- Maintained User Provisioning across Clusters. Built automated user provisioning scripts to create users in LDAP & principals in KDC.
- Administering large scale Cloudera Hadoop environments build and support including design, cluster set up, performance tuning and monitoring in an enterprise environment
- Setup, configuration and management of security.
- Automate cluster node provisioning and repetitive tasks.
Environment: Cloudera CDH 5.4.X Hadoop-2.6, CentoOS 6.6, MapReduce, HDFS, Pig, Hive-1.1, Sqoop, HBase, Zookeeper, Oozie, Shell Script.
Sr. ETL Analyst/Developer
Confidential, NJ
Responsibilities:
- Expertise in designing and implementing Data Stage Architecture in data warehousing and Business Intelligence projects.
- Worked with Functional team and Data Modelers/Architects to identify and understand the data from different source systems.
- Involved in the Analysis of the functional side of the project by interacting with functional experts to design and write technical specifications.
- Worked on the Architecture of ETL process.
- Created Data stage jobs (ETL Process) for populating the data into the Data warehouse constantly from different source systems like ODS, flat files, scheduled the same using DataStage Sequencer for SI testing.
- Extracted data from sources like Oracle and Flat Files.
- Preparing development timing plans & reporting to senior management about the supplier progress system & ensuring their engineering support for onsite integration & production launch.
- Involved in Quality Assurance, Unit Testing and Integration Testing to test jobs and also the system process flow.
- Worked on changed requests as per clients and projects technical specification needs.
- Awareness about the functional/business aspects for the components.
- Automated process of job monitoring which helped in minimizing the manual intervention & documenting them perfectly.
- Provide support for monthly/weekly batches in production run.
- Involved in the Documentation of the ETL phase of the project.
- Developed the reusable components, best practices that were later on used in other Data warehouse.
Environment: Data Stage 9.X, DB2, Oracle 11g, UNIX, MICROSOFT VISIO, Control-M, SQL plus, Win CVS.
Data warehouse / BI Developer
Confidential
Responsibilities:
- Understanding existing business model and customer requirements.
- Create Dimensional Model (Logical and Physical Model).
- Designed the ETL flow i.e. architectural design.
- Translated business requirements into ETL Parallel jobs that maximize object reuse, parallelism, and performance using Datastage.
- Implemented Auditing and Logging Mechanism.
- Involved in creating of complex SQL Queries.
- Developed complex stored procedures to create various reports.
- Performance tuning and testing on stored procedures, indexes and triggers.
- Involved in report design and coding for reports using SSRS.
- Deployed reports, created report schedules and subscriptions.
- Managing and securing reports using SSRS.
Environment: InfoSphere Datastage7.5, Unix, Shell Script, SQL, PlSql, Oracle, MSSQL Server 2008, SSRS.
DW/ BI Developer
Confidential
Responsibilities:
- Designed the Target Schema definition and ETL Jobs using Data stage.
- Used DS Director to view logs and clears logs and validates the job.
- Mapping Data Items from Source Systems to the Target System.
- Tuning the performance of ETL jobs.
- Involved in creating Stored Procedures, views, tables, constraints.
- Generated reports from the cubes by connecting to Analysis server from SSRS.
- Designed and created Report templates, bar graphs and pie charts.
- Modify and enhance existing SSRS reports.
Environment: MS SQL Server Enterprise 2000, Infosphere Datastage 7.5, Oracle, XML, Unix Shell Script.
