We provide IT Staff Augmentation Services!

Hadoop Developer Resume

0/5 (Submit Your Rating)

NY

SUMMARY

  • Innovative and highly qualified software professional with more than 7 years of experience in J2EE, Web Technologies, Python and 4 years of experience in Big Data Ecosystem.
  • Driven to architect Big Data solutions on multiple platforms using data analytics.
  • Extensive knowledge on Hadoop technologies like HDFS, YARN, MapReduce, Spark, Pig, Hive, Flume, Sqoop, Kafka, Oozie and Zookeeper.
  • Experienced on loading and transforming of large sets of structured, semi - structured and unstructured data from HBase through Sqoop and placed in HDFS for further processing.
  • Worked on several Hadoop Distribution platforms like Cloudera and Hortonworks.
  • Extensive knowledge on Lambda Architecture.
  • Used Spark streaming to analyze and transform the large datasets.
  • Worked on data frames with PythonScientific packages like Pandas and Date time to handle big data effectively.
  • Experience in developing SQL scripts, indexes and complex queries for data analysis and extraction.
  • Interact with e-commerce store APIs to clean, parse and transfer data into SQL tables.
  • Experience in working with cloud environments like AWS EC2 and S3.
  • Experience in developing web pages using front-end technologies.
  • Worked in a matrixed environment with numerous teams including product marketing, demand generation, strategy and planning.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, Spark, Yarn, Hive, Sqoop, Pig, Oozie, MongoDB, Cassandra, Kafka, Flume.

BI Tools: Tableau

Programming Languages: HTML, CSS, Java, JS, jQuery, Bash, Python, PHP, Spark, Scala, PIG, Hive

Databases: MySQL, Oracle, NoSQL, HBase

IDE’s: PyCharm, Eclipse, NetBeans, Dream Viewer.

Platforms: Windows, Linux

PROFESSIONAL EXPERIENCE

Confidential, NY

Hadoop Developer

Responsibilities:

  • Performed business requirements analysis, conducted research activities to build accomplished solutions for advanced analytical problems in the core of each project.
  • Responsible for data cleaning and developing MapReduce programs for unstructured data obtain through flume and structured data of RDBMS using Python, Java and various tools of Hadoop ecosystem to analyze the data.
  • Exported the analyzed data to the RDBMS using Sqoop for visualization and to generate reports for the BI team.
  • Aspera Client installed on Amazon EC2 Instance is used to connect to the HDFS to store data in the Amazon S3 cloud.
  • Loading data into Spark RDD to perform in-memory computations on the data as per the requirements.
  • Used Python sub-process module to call the PySpark job, SFTP and Oracle store procedure.
  • Used Sqoop to fetch the datafrom and to HDFS and RDBMS.
  • Used Spark Streaming to receive real time data from Kafka and stores the stream data to HDFS using Scala and HBase.
  • Creating data pipelines to load logs from S3 data lake to EMR.
  • Cleaning data using PIG and storing it in RedShift.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Converted SQL/Hive queries to Spark transformation using Scala and Spark RDD.
  • Setup the AWS EC2 instances to run Apache Spark using the terraform scripts.
  • Pulled raw data from RDS using Sqoop to EMR and store the transformed data back into RDS after the Spark processing.
  • Used Scala as the programming language to develop the transformation code, validation code and connect to AWS S3 to pull and store data in it.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive.
  • Worked on performance optimization of MapReduce jobs by analyzing the I/O latency, map time, combiner time, reducer time etc.

Environment: Hadoop, Hive, MapReduce, Python, Java, Kafka, Docker, Spark, Sqoop, Oracle, MySQL, Git

Confidential, Watertown, MA

Hadoop Developer

Responsibilities:

  • Transformed functional and technical requirements into detail design.
  • Performed qualitative analysis on vast amounts of data stored in company's databases, data warehouses, and data marts.
  • Installed and configured multiple nodes of Hadoop on Amazon Web EC2 systems.
  • Installed and configured HDFS, YARN, MapReduce, PIG and HIVE.
  • Loaded data into HDFS from HBase and local file systems using Hadoop File System commands.
  • Importing and exporting of data from RDBMS to HDFS using SQOOP.
  • Cleaning of the data during import like changing the delimiter and file format using Pig Latin.
  • Used Hive extensively to access the data and to publish it in well-known schemas.
  • Created external tables with proper partitions for efficiency and loaded the structured data in HDFS.
  • Creating Hive tables, loading data and running Hive queries in those data.
  • Developed and maintained different Map Reduce, Hive and Pig jobs through workflows in Oozie.
  • Responsible for writing map reduce code for integrating and transforming data sets.
  • Loading RDD data from HDFS to develop Spark applications and writing the results back into HDFS using Spark.
  • Used Spark SQL to interact with the meta store as an input source to develop Spark applications.
  • Filtered data and joined disparate datasets using Spark.
  • Leveraged AWS S3 to store the raw data pulled from Salesforce and used S3 to store the transformed data after the Spark processing.

Environment: Pig, Hive, Hbase, SQL, EC2, S3, Spark, Python, Java, Scala, Git, Jira

Confidential, NY

Hadoop Developer

Responsibilities:

  • Assisted in preparation of analytical reports by collecting, defining and interpreting the data.
  • Ingest real-time and near-real-time streaming data into HDFS.
  • Process streaming data as it is loaded onto the cluster.
  • Responsible for tuning PIG and Hive QL scripts.
  • Created Hive tables, loaded data using Sqoop from RDBMS and wrote Hive queries.
  • Responsible to export the data from Unix to HDFS file system.
  • Managed, reviewed and exported log files generated from various sources to HDFS.
  • Involved in migrating jobs from development to test and production environment.
  • Responsible to develop and define MR jobs flows.
  • Extract, clean, audit and prepare data for analysis, depending on well-structured procedures and maintaining reproducibility of results.
  • Migrated ETL jobs to Pig scripts to do transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Wrote various API’s to read HBase tables, data cleansing and write to another HBase table.
  • Involved in creating Hive tables, loading data and running Hive queries in those data.
  • Used Spark API’s to perform various job functions over Cloudera to perform analysis in real-time.
  • Analyzed the web log data using the HiveQL to know and calculate number of users visited the page, duration of time spending and various transactions done by the user.
  • Used Spark API’s to perform various job functions over Cloudera to perform analysis in real-time.
  • Create and maintain quick reference guides and standard operating procedures.

Environment: Sqoop, PIG, Hive, Python, Oracle, MySQL, Parquet, Docker, Jira, Git, Excel, Tableau

Confidential

Associate Software Engineer

Responsibilities:

  • Worked extensively with Bootstrap, JavaScript, and jQuery to optimize the user experience.
  • Developed applications using the Struts with Spring Integration using the tiles framework.
  • Modified queries, functions, cursors, triggers and stored procedures for MySQL database to improve performance, while processing data.
  • Developed modules using Spring Framework for Dependency injection through configuration files and ease the integration of different frameworks.
  • Used Jenkins along with Maven for continuous integration.
  • Trained extensively in agile methodologies and implemented by participating in scrum standups.
  • Providing post-deployment support for upgrades/enhancements on UAT and production environments.
  • Responsible for debugging the project monitored on Agile JIRA.
  • Communicating internally with application owners, vendors and users to resolve the issues.

Environment: HTML, CSS, Python, Java, JavaScript, WebSphere, WebLogic, IIS, APACHE, Tomcat, ARCOS

Confidential

Junior Web Developer

Responsibilities:

  • Developed secured and dynamic websites from scratch using various web designing techniques based on concepts to UI.
  • Created landing, registration and survey pages to gather and manage user data using PHP and MySQL.
  • Used Bootstrap as a mechanism to manage and organize the HTML page layout.
  • Developed generic database Connection Pooling with WebLogic Admin server using Spring, SQL query optimization.
  • Used Restful APIs to access data from different suppliers.
  • Implemented Web Services using Spring Web services.
  • Used Microsoft Web expression and Adobe Dream-viewer to design the websites and make the workflow faster.
  • Developed and designed SQL procedures and Linux shell scripts for data export/import and for converting data.
  • Responsible for deploying CR’s on Live, DMZ and DR servers.

Environment: HTML, CSS, Bootstrap, Photoshop, Web Expression, jQuery, Java, MySQL, PHP, Linux

We'd love your feedback!