Hadoop/big Data Developer Resume
Alpharetta, GA
PROFESSIONAL SUMMARY:
- Overall 5+ years of IT experience onBigDatatechnologies, Spark and database development.
- Strong experience working with HDFS, MapReduce, Spark, Hive, Sqoop, Flume, Kafka, Oozie, Pig and HBase.
- Experience working with DataFrames, RDD, Spark SQL, Spark Streaming, APIs, System Architecture, and Infrastructure Planning.
- Experience in usage of Hadoop distribution like Cloudera and Hortonworks.
- Worked extensively to integrate Horton works with BI tool Tableau.
- Expertise in developing solutions to analyze large data sets efficiently.
- Integrated Kafka with Spark Streaming for real timedataprocessing.
- Built real - timeBigDatasolutions using HBase handling millions of records.
- Implemented Hadoop baseddatawarehouses, integrated Hadoop with EnterpriseDataWarehouse systems.
- Hadoop framework, Hadoop Distributed file system and Parallel processing implementation.
- Expertise in developing solutions to analyze large data sets efficiently.
- Skilled in writing Map Reduce jobs in Pig and Hive.
- Large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
- Experience in NOSQL databases and SQL databases.
- Involved in Creation of database, tables, stored procedures, triggers, and user defined functions. Involved in Installation and configuration of SQL Server
- Extensive hands-on experience with Linux and Windows.
TECHNICAL SKILLS:
HADOOP ECOSYSTEM: HDFS, MapReduce, Yarn(Cloudera, Hortonworks)
DATA ANALYTICS: Hadoop, Hive, Spark, Pig and Tableau
BIGDATATOOLS: Sqoop, Oozie, HBase, and Flume
PROGRAMMING LANGUAGES: Python, Scala and Java
DATABASE TOOLS: Oracle, MySQL, MS SQL server.
OPERATING SYSTEMS: Windows, Unix, Linux
PROFESSIONAL EXPERIENCE:
Confidential, Alpharetta, GA
Hadoop/Big Data Developer
Responsibilities:
- Work with Hadoop ecosystem and Implement Spark using Scala and utilizing DataFrames and Spark SQL API for faster processing ofdata.
- Develop Hive queries to process teh data, Implement Partitions and Buckets in Hive.
- Develop RDD's/DataFrames in Spark using and apply several transformation logics to loaddatafrom HadoopDataLakes.
- Provide proof of concepts converting Filedatainto parquet format to improve query processing by using Hive.
- Develop Hive and Pig scripts for joining teh raw data with teh lookup data and for some aggregative operations as per teh business requirement.
- Filtering and cleaning data using Scala code and SQL Queries.
- Install Oozie workflow engine to run multiple map-reduce programs which run independently with time anddata.
- Importing and exportingdatainto HDFS and Hive using Sqoop.
- Working with Flume to load teh logdatafrom multiple sources directly into HDFS.
Environment: Hadoop, HDFS, Spark, Hive, Kafka, JSON, Linux, HBase, Python, Parquet, Hortonworks, Scala, Sqoop, Flume, SQL.
Confidential, Sandy Springs, GA
Pyspark / Hadoop Developer
Responsibilities:
- Developed Spark programs with Python, and applied principals of functional programming to process teh complex unstructured and structured data sets.
- Analyzing SQL scripts and designed teh solution to implement using PySpark
- Developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
- Used JSON and XML SerDe’s for serialization and de-serialization to load JSON and XML data into Hive tables.
- Used SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.
- HBase setup and storing data into Hbase, which will be used for analysis.
- Analyze SQL scripts and designed teh solutions to implement using Pyspark.
- Converting MapReduce programs into Spark transformations using Spark RDD in Pyspark.
- Implemented Spark using Pyspark API and utilizing Data frames and SparkSQL API for faster processing of data.
Environment: Hadoop, HDFS, Spark, Spark core, Spark Streaming, Yarn, Hive, Sqoop, Zookeeper, Flume, Kafka, HBase, Python, SQl scripting, Kerberos, Linux Shell Scripting, JSON, Parquet, Hortonworks.
Confidential, Phoenix, AZ
Hadoop/Big Data Developer
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and preprocessing.
- Develop different components of system like Hadoop process dat involves MapReduce and Hive.
- Worked with Hive on big data of logs to perform a trend analysis of user behavior on various online modules.
- Install Oozie workflow engine to run multiple jobs.
- Worked on sequence files, Bucketing, partitioning for Hive performance enhancement and storage
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to Stream teh log data from servers.
- Involved in HDFS maintenance and loading of structured and unstructured data.
- Debugging and identifying issues reported by QA with Hadoop jobs by configuring to local file system.
Environment: Hadoop, Cloudera, Linux, CentOS, MapReduce, HBase, Sqoop, Flume, HDFS, Python, Hive, SQl.
Confidential, Bellavue, WA
Big Data Developer
Responsibilities:
- Developed MapReduce programming dat works seamlessly on Hadoop clusters
- Worked with SQL, NoSQL, data warehousing & DBA
- Designing web services for swift data tracking and Querying data at high speeds
- Test software prototypes, propose standards and smoothly transfer it to operations
- Translate complex functional and technical requirements into detailed design.
- Perform analysis of vast data stores and uncover insights.
- Maintain security and data privacy.
- Managing and deploying HBase.
- Worked on Cluster coordination services through Zookeeper.
- Experienced in running Hadoop streaming jobs to process terabytes of xml formatdata.
- Created and maintained Technical documentation for launching Clusters and for executingHive queries to make UDFs.
- Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.
Environment: Hadoop, HDFS, MapReduce, Hive, HBase, Oozie, Zookeeper, Tableau, Java, JASON, Linux, CentOS, Cloudera, Sqoop, Flume, SQL.
Confidential
SQL Developer
Responsibilities:
- Business and user requirements gathering
- Involved in Creation of database, tables, stored procedures, triggers, and user defined functions. Involved in Installation and configuration of SQL Server
- Backing up and restoring SQL server databases.
- Design logical models and architecture.
- Collection of data through design of survey questionnaires.
- Optimize database systems for performance efficiency.
- Data Mapping between teh source and teh destination and documentation of teh data mapping spreadsheet
- Supervised data entry into SQL server database through application interfaces, Debugging.
Environment: oracle 10g, SQL developer, PL/SQL, Shell scripts, oracle forms, SQL*loader, Web focus reporting, triggers, wise Package studio 7.0.
