We provide IT Staff Augmentation Services!

Big Data Developer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Big Data developer with about 7 years of professional IT experience that includes about 4 years of Big Data experience in the areas of Health Care, Insurance and Product related fields.
  • Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks.
  • Experienced in developing and deploying big data applications on both Amazon Web Services and Microsoft Azure.
  • In - depth experience in using various Hadoop Ecosystem tools like HDFS, MapReduce, Yarn, Pig, Hive, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, and Zookeeper.
  • Extensive knowledge of Hadoop architecture and its components.
  • Exposure to Data Lake Implementation using Apache Spark.
  • Developed Data pipelines and applied business logic using Spark.
  • Well-versed in spark components like Spark SQL, MLib, Spark streaming, and GraphX.
  • Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
  • Used Scala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
  • Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka.
  • Experience in developing a data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store in HDFS.
  • Hands-on experience in tools like Oozie and Airflowto orchestrate jobs.
  • Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with the Hadoop cluster.
  • Expertise in Cluster management and configuring Cassandra Database.
  • Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
  • Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Experience in the practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS, ELB, Kinesis, SNS, Redshift.
  • Worked on data warehousing and ETL tools like Informatica, Talend.
  • Strong knowledge of Enterprise Data Warehousing (EDW).
  • Automated data flow between different systems using NiFi
  • Experience working with Spring and Hibernate frameworks for JAVA.
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ.
  • Excelled in using version control tools like PVCS, SVN, VSS, and GIT.
  • Used web-based UI development using Django, jQuery UI, CSS, HTML5.
  • Development experience in DBMS like Oracle, MS SQL Server, Teradata, and MYSQL.
  • Experience with best practices of Web services development and Integration (both REST and SOAP).
  • Experienced in using build tools like Ant, Gradle, SBT, and Maven to build and deploy applications into the server.
  • Experience in complete Software Development Life Cycle (SDLC) in both Waterfall and Agile methodologies.

TECHNICAL SKILLS

Languages/Tools: Java, C++, Scala, VB, XML, HTML/XHTML, HDML,DHTML,Python.

Big Data: HDFS, MapReduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark,Kafka, Storm, Cassandra, Solr, Impala.

Cloud: AWS (S3, EC2,EMR, Kinesis), Azure(VM, CosmosDB,SQL DB)

Operating Systems: Windows 95/98/NT/2000/XP, MS-DOS, UNIX, multiple flavors of Linux.

Databases / NO SQL: Oracle 10g, MS SQL Server 2000, DB2, MS Access & MySQL. Teradata, Cassandra, and MongoDB.

GUI Environment: Swing, AWT, Applets.

Messaging & Web Services Technology: SOAP, WSDL, UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.

Networking Protocols: HTTP, HTTPS, FTP, UDP, TCP/IP, SNMP, SMTP, POP3.

Testing &Case Tools: JUnit, Log4j, Rational Clear case, CVS, Ant, Maven, JBuilder.

Version Control Systems: Git, SVN, CVS

PROFESSIONAL EXPERIENCE

Confidential

Big data Developer

Responsibilities:

  • Responsible for developing and supporting Data warehousing operations.
  • Involved in peta byte scale data migration operations.
  • Experienced in dealing with large scale HIPAA compliant data applications and handling sensitive information like PHI (Patient Health Information) in a secure environment.
  • Worked on building and developing ETL pipelines using Spark-based applications.
  • Maintained resources on-premises as well as on the cloud.
  • Utilized various cloud-based services to maintain and monitor various cluster resources.
  • Conducted ETL Data Integration, Cleansing, and Transformations using Apache Kudu and Spark.
  • Used Apache Nifi for file conversions and data processing.
  • Developed applications to map the data between different sources and destinations using Python and Scala.
  • Reviewed and conducted performance tuning on various Spark applications.
  • Responsible for managing data from disparate sources.
  • Experienced in loading and transforming large sets of structured semi-structured and unstructured data.
  • Using Hive Script in Spark for data cleaning and transformation purpose.
  • Responsible for migrating data from various conventional data sources as per the architecture.
  • Developed Spark applications in Scala and Python to migrate the data.
  • Developed Linux based shell scripts to automate the applications.
  • Provided support for building Kafka consumer applications.
  • Performed unit testing and collaborated with the QA team for possible bug fixes.
  • Collaborated with data modelers and other developers during the implementation.
  • Worked in an Agile-based Scrum Methodology.
  • Load data into Hive partitioned tab.
  • Export the analyzed data to relational databases using Kudu for visualization and to generate reports for the Business Intelligence team.

Environment: AWS, Linux, Spark-SQL, Python, Scala, CDH 5.12.1, Kudu, Spark, Oozie, Cloudera Manager, Hue, SQL Server, Maven, Git, Agile methodology.

Confidential

Spark Developer

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Involved in importing structured and unstructured data into HDFS.
  • Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
  • Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
  • Developed traits and case classes etc. in Scala.
  • Developed Spark scripts using Scala shell commands as per the business requirement.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
  • Well versed in using Elastic Load Balancer for Autoscaling in EC2 servers.
  • Configured workflows that involve Hadoop actions using Oozie.
  • Used Python for pattern matching in build logs to format warnings and errors.
  • Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop,Cloudera, Oracle 10g, Linux.

Confidential

Hadoop/Big Data Analyst

Responsibilities:

  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Involved in running MapReduce jobs for processing millions of records.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Responsible for Data Modeling in Cassandra as per our requirement.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie and cron jobs.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Used Elastic Search & MongoDB for storing and querying the offers and non-offers data.
  • Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts.
  • Deployed and built the application using Maven.
  • Used Python scripting for large scale text processing utilities
  • Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
  • Responsible for data modeling in MongoDB in order to load data which is coming as structured as well as unstructured data.
  • Unstructured files like XML's, JSON files are processed using custom-built Java API and pushed into MongoDB.
  • Wrote test cases in MRunitfor unit testing of MapReduce Programs.
  • Involved in templates and screens in HTML and JavaScript.
  • Developed the XML Schema and Web services for the data maintenance and structures.
  • Built and deployed applications into multiple UNIX based environments and produced both unit and functional test results along with release notes.

Environment: HDFS, MapReduce, Hive, Pig, Cloudera, Impala, Oozie, Greenplum, MongoDB, Cassandra, Kafka, Storm, Maven, Python, CloudManager, Ambari, JDK, J2EE, Struts, JSP, Servlets, Elastic Search, WebSphere, HTML, XML, JavaScript, MRunit.

Confidential

Hadoop/Big Data Analyst

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive. HBase and MapReduce
  • Extracted data of everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing
  • Installed and configured Hadoop, MapReduce, and HDFS clusters
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Loaded the structured data which was resulted from MapReduce jobs into Hive tables.
  • Identified issues on behavioral patterns and analyzed the logs using Hive queries.
  • Analyze and transform stored data by writing MapReduce or Pig jobs based on business requirements
  • Used Flume to collect, aggregate, and store the weblog data from different sources like web servers, mobile, and network devices and import to HDFS
  • Using Oozie, developed a workflow to automate the tasks of loading the data into HDFS and pre-processing with Pig scripts
  • Integrated Map-Reduce with HBase to import bulk data using MR programs
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Developed data pipeline using Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers, using tools like SQL Profiler and Database Tuning Advisor (DTA)

Environment: HDFS, Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Talend, HiveQL, Java, Maven, Avro, Eclipse and Shell Scripting.

We'd love your feedback!