We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

Austin, TX

SUMMARY:

  • Having around 7+ Years of experience in Information Technology Industry which includes 4 years of experience in Big Data Ecosystem like Hadoop and Spark Ecosystems and worked in all phases of software development life cycle and Big Data Analytics with hands on experience on writing Map Reduce Jobs on Hadoop Ecosystems including HIVE, PIG, HBASE, Flume and Oozie.
  • Experience in Hadoop Ecosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie and Zookeeper.
  • Hands on experience in using various Hadoop distributions (Cloudera, Hortonworks).
  • Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in migrating the data using Sqoop from HDFS to relational database system and vice - versa.
  • Experienced in configuring Flume to stream data into HDFS.
  • Expertise in implementing Ad-hoc queries using Hive QL and good knowledge in creating Hive tables and loading and analyzing data using hive queries.
  • Expertise in developing Hive Generic UDF's to implement complex business logic to in corporate into Hive QL.
  • Experienced working with Spark Streaming, SparkSQL and Kafka for real-time data processing.
  • Good Knowledge on applying rules and policies using ILM (Information Life Cycle Management) workbench for Data Masking Transformation and loading into targets.
  • Hands on experience with Spark Data frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building.
  • Hands on performing ad-hoc queries on structured data using Hive QL and used Partition and Bucketing techniques and joins with HIVE for faster data access.
  • Experience in loading streaming data into HDFS and, in performing streaming analytics using stream processing platforms like Flume and Apache Kafka messaging system.
  • Experience in importing and exporting data using stream processing platforms like Flume and Kafka.
  • Good understanding and knowledge of NoSQL databases like MongoDB, Hbase and Cassandra.
  • Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
  • Developed Apache Spark jobs using Scala in test environment for faster data processing.
  • Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Experience in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
  • Experienced in working with Confidential Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Expertise in designing different types of Data Masking algorithms masking techniques and cut to fit solution designs.
  • Expertise in designing data refresh, data loading and data provisioning techniques using required technical components.
  • Expertise in working with MapReduce, SQL Server and Oracle databases application servers and database servers 2 Domain.
  • Good experience in AGILE delivery process of software using SCRUM.
  • Good in developing, publishing and execute Test plans, Test procedures, Test Results.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, Map Reduce, Hive, YARN, Kafka, Flume, SqoopImpala, Oozie, Zookeeper, Pig, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache

Languages: Python, SQL, Scala.

No SQL Databases: HBase

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

Methodology: Agile

Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.

Operating Systems: Windows, Linux.

Third Party Tools: Outline Extractor, SQL Developer, Putty, WINSCP.

PROFESSIONAL EXPERIENCE:

Big Data Developer

Confidential, Austin, TX

  • Collaborate with analysts, architects, database engineers, and other software engineers in designing, architecting, developing, code reviewing, and testing of new software programs and applications in an agile environment.
  • Imported the data from different sources like AWS S3, Local file system into Spark RDD.
  • Designed and implemented Operational Data Stores, Data Warehouses, Master Data Management, BI Reporting projects using Big Data technologies, including Hadoop, AWS Redshift, Oracle and SQL server.
  • Worked on importing and exporting data from Oracle data into HDFS using SQOOP for analysis, visualization and to generate reports.
  • Performed ad-hoc queries on structured data using Hive QL and used Partition, bucketing techniques and joins with Hive for faster data access.
  • Develop Hive queries on external tables in order to perform various analysis.
  • Used HUE for running Hive queries. Created partitions according to data using Hive to improve performance.
  • Built S3 buckets and managed policies for S3 buckets and used S3 buckets for storage and backup on AWS.
  • Worked on analyzing Hadoop cluster and different Big Data analytics tools including Pig, Hive, HBase and Sqoop.
  • Working Knowledge of Big Data technologies e.g. MapReduce, Hadoop, Hive, and Spark.
  • Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Used Different Spark Modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
  • Experience in creating Impala views on hive tables for fast access to data.
  • Worked with Hadoop distribution from Cloudera and MapR.
  • Developed Custom Loaders and Storage Classes in PIG to work with various data formats like JSON, XML, CSV etc.
  • Fine Tuned SQL queries and PL/SQL blocks for the maximum efficiency and fast response using Oracle Hints, explain plans and Trace sessions with Cost and Rule based optimization.
  • Used the JSON and XML SerDe's for serialization and de-serialization to load JSON and XML data into HIVE tables.

Environment: Hadoop, HDFS, MapR, Cloudera Hive, Pig, HBase, Sqoop, Spark, AWS, Oozie, Zookeeper

Hadoop Developer

Confidential, Fairfield, NJ

  • Worked in importing and exporting the data from Relational Database Systems to HDFS by using Sqoop.
  • Developed a common framework to import the data from Teradata to HDFS and to export to Teradata using Sqoop.
  • Load and transform large sets of structured, semi structured, and unstructured data that includes Avro, sequence files and XML files.
  • Involved in gathering the requirements, designing, development and testing.
  • Utilized Apache Hadoop environment by Cloudera.
  • Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
  • Imported the log data from different servers into HDFS using Flume and developed MapReduce programs for analyzing the data.
  • Performed operation using Partitioning pattern in MapReduce to move records into different categories.
  • Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
  • Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
  • Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Worked on Hue interface for querying the data.
  • Hive was Used to Produce Results quickly based on the report that was requested
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Developed Pig scripts for data analysis and extended its functionality by developing custom UDF's.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV etc.
  • Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the users.
  • Oozie and Zookeeper were used to automate the flow of jobs and coordination in the cluster respectively.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Worked with SCRUM team in delivering agreed user stories on time for every Sprint.

Environment: Hadoop, HDFS, Hive, HBase, Pig, SQOOP, Oozie, MySQL, MapReduce, Linux, Eclipse, Zookeeper, Cloudera.

PYTHON DEVELOPER

Confidential

  • Translated the customer requirements into design specifications and ensured that the requirements translate into software solution.
  • Wrote Python routines to log into the websites and fetch data for selected options.
  • Built application logic using Python.
  • Install and Configure Linux with Apache, Oracle 8I and PHP (LAMP Project) and web services and Confidential Web services (AWS)
  • Performed web testing and automated testing using Selenium in the test environment and opened bugs in the bug tracking tool.
  • Developed high availability real time Messaging system for financial/Banking transactions and developed associated components in Linux, UNIX, AIX platforms.
  • Developed Real-time messaging system for new payment framework for Linux, UNIX (HP), AIX OS platforms.
  • Knowledge of the Server, Network, and Hosting Environment.
  • Involved in Developing a Restful API'S service using Python Flask framework
  • Worked on python-based test frameworks and test-driven development with automation tools
  • Held meetings with client and worked all alone for the entire project with limited help from the client.
  • Used Ansible for automating cloud deployment process.
  • Worked on Jenkins continuous integration tool for deployment of project.

Environment: Linux, Python, PHP, MySQL, Ajax, Shell Script, HTML, CSS.8

JR. PYTHON DEVELOPER

Confidential

  • Performed design, involved in code reviews and wrote unit tests in Python
  • Worked on tuples, dictionaries, object-oriented concepts-based inheritance features for making algorithms.
  • Interacting with team members on technical programming.
  • Solving production support issues (such as bug fixes, queries etc).
  • Prepared documentation for the generated reports
  • Developed GUI using Python and Django for dynamically displaying the test block documentation and other features of python code using a web browser
  • Developed the customer complaints application using Django Framework, which includes Python code.
  • Involved in development of main modules like CSV import, bulk content upload.
  • Successfully migrated the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.
  • Involved in building database Model APIs and Views utilizing Python, in order to build an iterative web-based solution.
  • Used Git as a version-controlling tool to collaborate and coordinate with the team members.
  • Participated in the development of application architecture and blueprints to define application components, platforms, interfaces and development tools.
  • Designed and developed data management system using MySQL
  • Worked on UI using HTML, CSS.
  • Used Git for version control
  • Monitored and troubleshot web applications
  • Performed data analysis, manipulation and reporting using pivot tables and advanced functions in MS EXCEL.

Environment: Python, .NET, PyQuery, MVW, HTML5, Shell Scripting, JSON, Apache Web Server, SQL, UNIX, Windows, and Python libraries.

We'd love your feedback!