Hadoop Developer Resume
Arlington, TX
SUMMARY:
- 8 years of experience in the Information Technology industry.
- Experience with Big Data processing using Apache Hadoop . Primarily with HDFS, MapReduce/Yarn, Hive and Pig
- Experience in supporting data analysis projects using Elastic Map Reduce (EMR) on the Amazon Web Services (AWS ) cloud. Exporting and importing data into S3 .
- Planning, upgrading, installing, configuring, maintaining, and monitoring Hadoop Clusters and using Apache, Cloudera (CDH3, CDH4, CDH5) distributions
- Used Spark DataFrames API over Hortonworks platform to perform analytics on Hive data
- Skilled in Hbase, ZooKeeper, SQOOP, OOZIE, and Flume .
- Experience in deploying Hadoop 2.0 (YARN).
- Exposure in configuring High Availability in Hadoop Cluster.
- Firm understanding of Hadoop architecture and various components including HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming.
- Installation, configuration, supporting and managing Hadoop Clusters using Apache , Cloudera ( CDH3, CDH4 ), Yarn distributions ( CDH 5.X ).
- Experience in Zookeeper and ZKFC managing and configuring NameNode failure scenarios.
- Implemented enterprise level security using LDAP, Kerberos, Ranger and Sentry .
- Well versed in Hadoop Cluster maintenance , troubleshooting , monitoring and following proper backup & recovery strategies.
- Scheduling of all Hadoop/SQOOP/Hive/Hbase jobs using Oozie .
- Experience in importing and exporting the logs using Flume
- Moved data from HDFS to RDBMS and vice - versa using SQOOP .
- Firm knowledge of Data Warehousing concepts.
- Load and transform large sets of structured, semi structured and unstructured data.
- Skilled in managing and reviewing Hadoop log files .
- Set up Linux environments with Passwordless SSH , Creation of file systems , disabling firewalls , Selinux and Java installation .
- Working knowledge of IDE’s such as Eclipse, PL/SQL Developer, TOAD
- Sound understanding of Oracle database architecture , configuration and administration , backup & recovery .
- Understanding of the Software development life cycle SDLC process.
- Hands on experience in Oracle advanced technologies such as 12c RAC , OEM, Grid Control, Data Guard (both logical and physical), partitioning, RMAN .
- Implemented data transfer using Export / Import utilities and Data Pump .
- Worked extensively on latest Oracle Versions including Upgrades , Migrations and Patching .
- Excellent written , verbal and personal communication skills
TECHNICAL SKILLS:
Big Data Technologies: Apache Hadoop, MapReduce, Cloudera, Hortonworks HDFS, Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, OOZIE, AWS
Databases: Hbase, Oracle 10G/11G/12C & Microsoft SQL Server
Languages: SQL, PL/SQL, HiveQL, Pig Latin, Java, Scala, Shell Scripting
Operating Systems: UNIX (Solaris, AIX, Red Hat Linux), Windows 9x/2000/NT/XP
Miscellaneous: Conversant with MS-Office suite
PROFESSIONAL EXPERIENCE:
Confidential, Arlington, TX
Hadoop Developer
Responsibilities:
- Importing and exporting data into HDFS from database and vice versa using SQOOP .
- Extracted the data from Oracle, Sql Server using Sqoop and loaded data into HDFS.
- Worked on Unix shell scripts for business process and loading data from different interfaces to HDFS.
- Scala /Java development
- Analyzed large amounts of data sources to determine optimal way to aggregate and validate the data sources and automate the source with Oozie .
- Created reports for the BI team using Sqoop to export data into HDFS and Hive .
- Designed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Worked on different environments like Development and Production Clusters.
- Transformed the code into app, development and Stage directories based on requirements.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed a data pipeline using Kafka to store data into HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Developed data solutions on AWS .
Environment: Hadoop , MapReduce, Cloudera , HDFS, Hive, Pig, Java, Scala , Kafka, Shell Script, SQL, Sqoop, Java, Eclipse, Hue, Oozie, HCatalog.
Confidential, Hartford, CT
Hadoop Developer
Responsibilities:
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Responsible for building scalable distributed data solutions using Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed Simple to complex Map reduce Jobs using Hive and Pig
- Optimized Scala jobs to use HDFS efficiently by using various compression mechanisms
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on Unix shell scripts for business process and loading data from different interfaces to HDFS
- Extensively used Pig for data cleansing.
- Created partitioned tables in Hive .
- Managed and reviewed Hadoop log files.
- Migration of ETL processes from Oracle to Hive to test the easy data manipulation.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig and also written Pig Latin scripts.
- Created Kafka consumer to recieve and store near real time data to Amazon S3
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources.
- Messaging services using Kafka
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop, MapReduce, Hortonworks, HDFS, Hive, Scala, Kafka, ETL, Pig, Java, Shell Script, SQL, Sqoop, Eclipse.
Confidential Warren, NJ
Hadoop Administrator/Developer
Responsibilities:
- Solid Understanding of Hadoop HDFS , Map - Reduce and other Eco-System Projects
- Installation and Configuration of Hadoop Cluster
- Working with Cloudera Support Team to Fine tune Cluster
- Plugin allows Hadoop MapReduce programs, HBase , Pig and Hive to work unmodified and access files directly. The plugin also provided data locality for Hadoop across host nodes and virtual machines.
- Developed Map Reduce jobs to analyze data and provide heuristics reports
- Adding, Decommissioning and rebalancing nodes
- Rack Aware Configuration
- Configuring Client Machines
- Configuring, Monitoring and Management Tools using Cloudera
- Cluster High Available Setup
- Applying Patches and Perform Version Upgrades
- Incident Management, Problem Management and Change Management
- Performance Management and Reporting
- Recover from Name Node failures
- Schedule Map Reduce Jobs -FIFO and FAIR share
- Installation and Configuration of other Open Source Software like Pig , Hive , HBASE , Flume and Sqoop
- Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
- Integration with RDBMS using sqoop and JDBC Connectors
- Working with Dev Team to tune Job Knowledge of Writing Hive Jobs
Environment : RHEL, puppet, CDH 3 distribution, Tableau, Datameer, HBase , CDH Manager, YARN , Hive , Flume , Kafka
Confidential, Rocky Hill, CT
Oracle Database Administrator
Responsibilities:
- Oracle Database Administration, tuning, development, analysis, design, all phases of SDLC, installation, patches, upgrades, migrations, configuration, database security, capacity planning, space management, data modeling, backup and recovery, cloning, auditing, SQL, PL/SQL, troubleshooting and documentation.
- Installation, Configuration and Maintenance of Oracle 10g/11g Real Application Cluster with ASM.
- Oracle Database Administration in a variety of environments like Oracle Linux, HP-UX, IBM-AIX, Sun Solaris, RHEL and EXADATA.
- Performs database full Backup (hot and cold), Incomplete Backup, Data Pump Export/Import, SQL Backtrack and Restore procedures.
- Experience in Transportable tablespaces, SQL Tuning Advisor, Data Pump, Flashback Table, Table space Management, Virtual Private Database, Materialized Views, and Oracle Streams.
- Responsible for monthly Production database loads in OLAP data warehouse environment.
- Experience of Auditing, Oracle Security, OEM, SQL Developer and TOAD
- Monitored monthly database growth across all environments and also provided assistance in capacity planning.
- Expertise in Database Creation, Partitions, Setting-up parameters, Tablespaces, Indexes, User Setup, Roles, Security, Storage and hands on experience in network management.
- Job scheduling using crontab along with TWS , and batch utility at the OS level and Shell Scripting for automating DBA tasks.
- Worked on all priority tickets using HPALM/HPQC/HPSM/Service Now .
- Knowledge in performance tuning, SQL tuning services with EXPLAIN PLAN , TKPROF, STATSPACK, SQL TRACE, AWR, and ADDM to collect and maintain performance statistics and improves the database performance.
- Worked closely with clients, business analysts, server administrators, systems programmers, and application developers to define and resolve information flow and content issues - helping to transform business requirements into environment specific databases
- Proficient in monitoring production job streams using Tivoli
- Excellent interpersonal and strong analytical, problem-solving skills with customer service oriented attitude.
- Provided 24/7 on call Production Database Support to ensure availability, efficiency and recoverability.
Confidential, Minneapolis, MN
Oracle Database Administrator
Responsibilities:
- Installation, Administration, Maintenance and Configuration of Oracle Databases 8i,9i,10g.
- Created and maintained different database entities included Tablespaces, data files, redo log files, rollback segments and renaming/relocating of data files on servers.
- Provided on-call support regarding production-related database problems.
- Used RMAN for backup and recovery strategy.
- Tuning of databases which includes Server side tuning and application tuning.
- Capacity planning of the databases / applications.
- Upgraded database from Oracle 8i/9i to 9i/10g
- Refreshing test and development environments (Oracle) from production databases as and when required.
- Optimized different SQL queries to insure faster response time
- Worked with Oracle Enterprise Manager.
- Work with Oracle Support to resolve different database related issues.
- Data replication using Golden Gate (Bi-directional and multi replication) across all prod databases for consistence across all sites.
- DDL replication through Golden Gate , analyze the SQLs, tuning if required before release the Code.
- Troubleshooting the Data Replication issues with Golden Gate in Prod and non-prod environments.
- Used Transportable table space, Data pump , RMAN for database migration between different platforms like Solaris to AIX.
Environment: Oracle 10g, 11g RH Linux, Solaris, Installation, Migration, Upgradation, OEM, RAC, RMAN, ADDM, AWR, Statspack.
