Lead - Big Data Hadoop/spark Resume
Raleigh, NC
SUMMARY
- Overall 10+ years of experience in design and deployment of Data Management and Data Warehousing Projects in various roles as a Data Modeler and Data Analyst on Big data technologies.
- Possesses 3+ years of rich Hadoop experience in design and development of Big Data applications, which involves Apache Hadoop Map/Reduce, HDFS, Hive, HBase, Pig, Oozie, Sqoop, Flume and Spark.
- Expertise in developing solutions around NOSQL databases like MongoDB and Cassandra.
- Experience with all flavor of Hadoop distributions, including Cloudera, Horton works.
- Excellent understanding of Hadoop architecture Map Reduce MRv1 and Map Reduce MRv2 (YARN).
- Developed multiple Map Reduce programs to process large volumes of semi/unstructured data files using different Map Reduce design patterns.
- Strong experience in writing Map Reduce jobs in Java and Pig.
- Experience with various performance optimizations like using distributed cache for small datasets, partition, bucketing in Hive and Map Side joins when writing Map Reduce jobs.
- Excellent understanding of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Worked extensively over semi - structured data (fixed length & delimited files), for data sanitation, report generation and standardization.
- Excellent hands on experience in analyzing data using Pig Latin, HQL, HBase and Map Reduce programs in Java.
- Developed UDF's in Java as and when necessary to use with PIG and HIVE queries.
- Have dealt with Zookeeper an Oozie Operational Services for coordinating the cluster and scheduling workflows.
- Strong Knowledge of Hadoop, Hive and Hive's analytical functions.
- Loaded the dataset into Hive for ETL Operation.
- Proficient using of big data ingestion tools like Flume and Sqoop.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Experience in handling continuous streaming data using Flume and memory channels.
- Good experience in benchmarking Hadoop cluster.
- Good knowledge on data analysis with SAS.
- Good knowledge on executing Spark SQL queries against data in Hive.
- Experienced in monitoring Hadoop cluster using Cloudera Manager and Web UI.
- Experience in implementing setting up standards and processes for Hadoop based application design and implementation.
- Extensive Experience working on web technologies like HTML, CSS, XML, JSON, JQuery.
- Hands-on experience with AWS (Amazon Web Services), using Elastic Map Reduce (EMR), creating and Storing data in S3 buckets and creating Elastic Load Balancers (ELB) for Hadoop front end Web UI’s
- Extensive knowledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring them through ambari and using IAM (Identity and Access Management) for creating groups, users.
- Extensive experience in documenting requirements, functional specifications and technical specifications.
- Extensive experience with SQL, PL/SQL and database concepts.
- Experience working on Version control tools like SVN and GIT revision control systems such as GitHub and JIRA to track issues and crucible for code reviews.
- Strong Database background with Oracle, PL/SQL, Stored Procedures, trigger, SQL Server, MySQL.
- Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
- Good Team Player, Strong Interpersonal, Organizational and Communication skills combined with Self-Motivation, Initiative and Project Management Attributes.
- Holds strong ability to handle multiple priorities and work load; also has ability to understand and adapt to new technologies and environments faster.
TECHNICAL SKILLS
Hadoop Core Services: HDFS, Map Reduce, Spark, YARN
Hadoop Distribution: Horton works, Cloudera, Apache
NO SQL Databases: MongoDB, Cassandra
Hadoop Data Services: Hive, Pig, Sqoop, Flume, Sqoop
Hadoop Operational Services: Zookeeper, Oozie
Monitoring Tools: Ambari, Cloudera Manager
Cloud Computing Tools: Amazon AWS
Languages: C, Java, Python, SQL, PL/SQL, Pig, HiveQL, Unix Shell Scripting
Databases: Oracle, MySQL, MongoDB
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Development Tools: Microsoft SQL Studio, Toad, Eclipse
Development Methodologies: Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE
Confidential, Raleigh, NC
Lead - Big Data Hadoop/Spark
Responsibilities:
- Involving in designing and implementing the migrating plan from BDW legacy to a modern high performance Hadoop Big Data Lake.
- Helping streamline business processes by developing, installing and configuring Hadoop ecosystem components that moved data from individual servers and Mainframe to HDFS. Analyzing factors such as products analytics, sales, and market/competitor statistics using cutting edge technologies such as Hadoop, Hive, and HBASE in a big data environment.
- Developed Sqoop Framework to Source Historical Data from Oracle BDW, DB2.
- Experience in setting up Flume, Sqoop and Hive Load to transfer data from different sources and Legacy BDW structured and unstructured data into the HDFS, Hive and Hbase.
- Helping the team for ETL mappings & ran sessions and workflows to execute the Land Process of loading the customer/product data into the Data Lake that was coming from various source systems.
Environment: Hadoop, HDFS, HBase, Hive, SQL, Sqoop, Flume, Oozie, Spark, Scala, Kafka
Confidential, Richfield, MN
Hadoop /Spark Big data Developer
Responsibilities:
- Data Ingestion: For ingestion of data from Oracle and MySQL and S3 to Hadoop so that it can be queried using Hive and spark SQL.
- Worked on Sqoop jobs for ingesting data from Oracle and MySQL
- Created hive external tables for querying the data
- Used Spark Data frame APIs to ingest S3 data
- Wrote scripts to load data from Red shift